Skip to Content

Why Snowflake is a good match for implementing Data Mesh

Martin Lam
May 10, 2021

So what’s the connection between Data Mesh and Snowflake? Or maybe we begin by explaining what they even mean in the first place. But before you start reading this piece, just picture a snowflake in your mind – one of a kind symmetrical crystal reflecting the perfect internal order of water molecules arranged in predetermined spaces.  Done? Now let’s begin.

“Meshing” up data platforms is common

While most of the modern enterprises consider themselves as data-powered or data driven, not many know how to address the challenges in the way we build data platform. So while a robust data architecture might enable better business intelligence, it is often a huge task to negotiate a central data lake, a central curated data area and a serving area with a big and central team of data engineers and data scientists. And the challenge only multiplies when the number of data sources in the platform and the demand for data use cases increase quickly. Added to this burden is unclear data ownership and suboptimal communication between the development team of the data platform, the teams who build the source systems and the business users who consume the data of the data platform. Moreover, the speed of response to change is not as expected in many cases. But it need not always be so complicated or ‘meshed up’.

What is Data Mesh?

When Zhamak Dehghani, director of emerging technologies at ThoughtWorks in North America, suggested a new data platform architecture, which addresses these dimensions, she called it a Data Mesh. According to Zhamak , Data Mesh just like an interlaced network, is based on the four following principles:

  • Domain-oriented decentralized data ownership and architecture

So that the ecosystem creating and consuming data can scale out as the number of sources of data, number of use cases, and diversity of access models to the data increases; simply increase the autonomous nodes on the mesh.

  • Data as a product:

So that data users can easily discover, understand, and securely use high quality data with a delightful experience; data that is distributed across many domains

  • Self-serve data infrastructure as a platform:

So that the domain teams can create and consume data products autonomously using the platform abstractions, hiding the complexity of building, executing, and maintaining secure and interoperable data products.

  • and federated computational governance.

So that data users can get value from aggregation and correlation of independent data products – the mesh is behaving as an ecosystem following global interoperability standards; standards that are into the platform.

What is Snowflake?

The Snowflake platform powers the Data Cloud – uniquely designed to connect businesses globally, at any scale and across any industry. Its unified architecture enables these customers to integrate data any type of data from a wide range of sources, and use it to power use cases across many different workloads, making it one of the fastest growing data platforms in the market at the moment. It has been built from the ground up to be cloud-native. Snowflake’s platform has many innovative capabilities to meet the challenges in the domain of data analytics, which many enterprises are facing today. The following is how Snowflake describes their platform:

And above all the platform is:

Why is Snowflake a good match for Data Mesh?

Easy to share data: One of the challenges with Data Mesh is how data that is built as products by separate cross-functional teams can be distributed across many domains in a timely manner and with a reasonable cost. Snowflake has a unique way to share data due to its multi-cluster shared data architecture- this is the Snowflake Data Marketplace. The data producer can provide data consumers access to live data within minutes without copying or moving the data. The data consumer can query the shared data from data producer instantly. Snowflake provides simply a better way to share data.

Data sharing can be done to both your internal teams, departments as well as external organizations, such as your customers and partners. Sharing data helps improving customer satisfaction, increase transparency, and boost business performance. That’s exactly one of the most important objectives of Data Mesh.

Simplified and reduced time to market: Data sharing happens usually through methods like API, FTP and cloud bucket storage which is much more complex and costly. With Snowflake, you can reduce substantially cost and time to market. You can even monitor easily how your data is used, understand who accesses your data and when they do it. You can learn what data is most used by your organization, partners, and customers.

Data Cloud: Another important feature in Snowflake that is very relevant for Data Mesh architecture is that Snowflake Cloud Data Platform is cloud agnostic. Snowflake runs on both AWS, Azure and GCP and will be available soon in more cloud regions than what any single cloud infrastructure provider offers. Snowflake has built what they call the Data Cloud. Data Cloud is by nature cross-region and cross-cloud. It links all these regions with each other, no matter their origin. It makes it easy for big and global companies with a multi cloud data landscape to share data with suppliers, partners, or other business units and implement Data Mesh.

Autonomous and self-serving: One of the four principles for Data Mesh is “self-serve data infrastructure as a platform”. The ability to create and consume data products autonomously using the platform abstractions by the domain teams is highly dependent on how complex the platform is built. Most of the data platforms today contains too many components and underlying technologies to handle both structured, unstructured and semi-structured data. Zhamak Dehghani wrote in her article “Self-serve infrastructure must include capabilities to lower the current cost and specialization needed to build data products”.

Low cost of ownership and better scalability: Snowflake is a SQL Cloud Data Platform. A platform for both big data and normal data. It handles many data formats. All you need to learn is normal SQL. Snowflake can even help you to get back historic data easily with time travel in case of accidental data loss. Your architecture and technology landscape are less complex with Snowflake as a central component in your data platform. This way, company specialists can focus on business value and data advancement, not maintenance.  And as a result, you can lower total cost of ownership and scale your platform better for new business areas.

Finally…
So by now you may have some idea about how Snowflake can help you deliver a single data experience integrating storage, compute, and services across multiple geographies and cloud and take your business to the next level. The Snowflake     platform fits rather well with Data Mesh to handle large data volumes with speed and efficiency. Capgemini has good competence in both Data Mesh and Snowflake, as well as Snowflake’s EMEA GSI Partner of the Year 2021. Feel free to contact us if you need our help.

Meet our expert

Martin Lam

Head of Technology, Principal Solutions Architect, Capgemini Insights and data, Norway
Martin has long experience in Business Intelligence, Data warehousing, Data platform and Analytics. He has competence in many areas from architecture, data modelling to design of solutions for business insights, balanced scorecards, advanced analytics, AI and reporting. He has worked with many database technologies, ETL tools, reporting tools and Cloud platforms Azure, AWS, Google Cloud and Snowflake.