Skip to Content


Dinand Tinholt
4th October 2023

To drive business value, it is important to leverage all the data from within your organization as well as from partners outside of it. Such a collaborative data ecosystem is an alignment of business goals, data, and technology, among one or more participants, to collectively create value that is greater than each can create individually. It is both combining and collaborating on that data.

With a little help from your friends

John Lennon and Paul McCartney met by chance in 1957 when Lennon’s band The Quarrymen was performing in Liverpool. McCartney then joined The Quarrymen and, after the band had already changed its name to The Beatles, they were by chance discovered by Brian Epstein, at that time a local record store manager who became the band’s manager in 1962.

The way we see data ecosystems is similar: it is sometimes about a chance encounter and then bringing various elements together. We could refer to the well-known Beatles song from 1969 Come Together as the unifying theme of this article but instead let’s choose another one, namely With a Little Help from My Friends, which was released in 1967. In the context of this story, a little help comes in the form of a little data. Bringing together data from your friends (customers, suppliers, partners, vendors, whoever) is what we would call “organized serendipity.”

Imagine you’re a retailer operating in a competitive market needing to stay on top of trends, having to make sure your shelves (whether physical or virtual) are filled and are appealing to your customers. As an example, out-of-stocks remain the single largest problem in retail. The challenge with keeping products stocked involves a complex value chain that must anticipate and respond to dynamic market forces. Extreme weather, local events, and even activity from social influencers can quickly alter the demand for a product. In an optimal world, suppliers, distributors, retailers, and other partners would have visibility to changing dynamics and consumption in real-time, enabling them to optimize their operational decisions on-the-fly. And yet supply chains across retail and consumer goods still operate much as they have for decades, making decisions on data that is days or weeks old. It is this delay between changes in demand and our ability to respond that lead to out-of-stocks.

The main sources for retail data are operations by the retailer, data from their ecosystem, competitive data from syndicated sources, and external environmental data from governments and commercial sources.

  • Retail operational data comes as a result of business operations, and includes everything from customer-facing retail sales data, advertising, e-commerce, customer support, reviews, and loyalty to back-of-house data from inventory, distribution, planning, and other management systems.
  • Retailers operate in a complex value chain, with data coming upstream from suppliers, wholesalers, and distributors, and integrating downstream with advertising and delivery partners.
  • Competitive data sources help retailers understand how their key competitors are operating in similar areas. Competitive distribution, assortment, pricing, promotions and advertising, sales, and other sources help retailers index their performance.
  • Environmental data helps retailers understand the context in which consumers are making decisions. This includes environmental data such as weather, local economic forces, census information, local events and foot traffic data, legal and regulatory changes, social data, keyword searches, and more.

Finding a cost-effective technology

No two organizations leverage the same data in the same way. The differences in their strategies, operations, competitors, geography, and the systems that support them are designed to help the company succeed. But this means that no two businesses have the same data ecosystem. Companies may exchange data in key areas but increasingly the differences in data between companies is perceived as a competitive advantage. Legacy data-sharing technologies were designed to support the lowest common denominator of collaboration but have struggled to meet the needs of real-time data sharing, quality, and governance and decisioning. Companies want the flexibility to communicate in real-time with a variety of information and across platforms.

The key to achieving this is to select a cost-effective technology that enables the broadest range of sharing options without proprietary technology or vendor lock-in, facilitates real-time data sharing and collaboration, ensures the control of quality and governance of data, and enables companies to focus on immediately leveraging all types of data to drive better decisions.

A retail lakehouse simplifies collaboration

A data lakehouse is a modern data-management architecture that combines the features of both data lakes and data warehouses. It is a unified platform for storing, processing, analyzing, and sharing large volumes of data, both structured and unstructured, in its native format, with support for batch and real-time data processing.

Databricks’ Lakehouse is built on open-standards and open-source, which avoids proprietary lock-in. This importantly extends to data sharing and collaboration. Databricks introduced Delta Sharing, which is an open-source project started by Databricks that allows companies to share large-scale, real-time data between organizations in a secure and efficient manner.

A Lakehouse is the optimal method for data collaboration as it addresses the critical needs in retail.

  • Real-time collaboration. Not only can companies share data that is being continuously updated, but Delta Sharing also enables sharing without movement of data.
  • Collaborate on all of your data. Unlike legacy systems, Delta Sharing enables companies to share images, video, data-science models, structured data, and all other types of data.
  • Centralized data storage. The Lakehouse architecture makes it easier for different users or groups to access and share data from a single source of truth, eliminating data silos and enabling seamless data sharing across various stakeholders.
  • It supports quality and compliance. A Lakehouse architecture helps ensure data integrity, traceability, and compliance with regulatory requirements, which are important considerations when sharing data with external users or organizations.
  • It simplifies data management and discovery. The Lakehouse architecture includes a robust data catalog and metadata management system that helps in documenting and organizing data assets.

“Collaborative data ecosystems hold immense potential for retail companies looking to thrive in an increasingly competitive and data-driven industry.”

With Delta Sharing, companies can securely share data with other organizations without having to copy or move data across different systems. Delta Sharing uses a federated model, which means that data remains in the original location and is accessed remotely by the recipient organization. This approach allows organizations to maintain control over their data while still sharing it with others.

Collaborative data ecosystems hold immense potential for retail companies looking to thrive in an increasingly competitive and data-driven industry. By leveraging these ecosystems, retailers can optimize their supply chain, gain valuable customer insights, make informed decisions, foster collaboration, and ensure data security and compliance. As more organizations recognize the value of such ecosystems, we can expect the retail industry to become even more connected, efficient, and customer-centric.



By leveraging data from within and outside their organization, businesses can create collective value that surpasses individual capabilities, fostering collaboration and innovation.


Outdated supply chains hinder retailers from effectively responding to dynamic market forces, making real-time data sharing imperative for optimizing operational decisions and reducing out-of-stock issues.


A modern data-management approach, the Lakehouse architecture combines data lakes and data warehouses, enabling real-time collaboration, centralized storage, and simplified data management for improved decision-making.


Delta Sharing, an open-source project, empowers companies to securely share large-scale, real-time data without data movement, unlocking the potential for seamless collaboration, compliance, and valuable insights in the retail industry.

Interesting read?

Capgemini’s Innovation publication, Data-powered Innovation Review | Wave 6 features 19 such fascinating articles, crafted by leading experts from Capgemini, and key technology partners like Google,  Starburst,  MicrosoftSnowflake and Databricks. Learn about generative AI, collaborative data ecosystems, and an exploration of how data an AI can enable the biodiversity of urban forests. Find all previous waves here.

Dinand Tinholt

Vice President, Insights & Data, Capgemini
“Even while investment levels in data and AI initiatives are increasing, organizations continue to struggle to become data-powered. Many have yet to forge a supportive culture and a large number are not managing data as a business asset. For many firms, people and process challenges are the biggest barriers in activating data across the enterprise.”

Rob Saker

VP Global Retail & Manufacturing, Databricks  
Rob Saker has proven track record of bending the curve on digital transformation to transform how companies embrace emerging digital and analytic capabilities. He has helped customers generate billions in new revenue and savings through data and AI capabilities.

Reshma Bhatt

CP and Retail Industry Lead, Insights & Data, Capgemini 
Reshma Bhatt is an accomplished and value-driven professional with 20+ years of leadership and delivery experience. Success record delivering regional and global initiatives across various industry verticals. A passionate data enthusiast with experience in BI & Analytics, SharePoint, Architecture, Azure cloud migration & keen interest in AI and Machine Learning.