Sharing without showing: Data clean rooms allow for unprecedented collaboration

Jennifer Belissent

26 July 2023

Pharmaceutical companies can identify the best hospitals for clinical trials with a look-alike analysis against patient records. Insurance companies can collaborate to identify fraudulent claims. Media outlets can offer premium placement to advertisers to ensure targeted messaging. Loyalty programs can deliver truly personalized services across hotels, airlines, and other services. Telecom operators can collaborate with location data to enrich those personalized services. Emergency and social services can collaborate to help those in need.

Yet in many cases, the relevant data is personal information, and protected by privacy laws and bonds of trust. How can that data be shared?

The use cases for secure collaboration with data clean rooms are endless

Imagine the following scenario.

A crowd of spectators is watching a big game and the teams are tied. The tension mounts. The fans grow restless. He shoots. He scores! The roar of the crowd can be heard all the way down the neighborhood street. And all the consumer brands want to know who is watching and how to reach these audiences. Yet, these sports fans are watching the game in the privacy of their homes, and the network they’re watching on must legally protect their data.

How can these media outlets share their viewer data – or the insights from it – without violating data protection laws and the trust of their subscribers?

It turns out that a similar question was posed by an academic in the early 1980s. Professor Andrew Yao introduced the problem: Alice and Bob, both millionaires, want to know which of them is richer but neither wants to reveal his or her exact wealth. Through complex mathematical proofs, Yao’s Millionaires’ problem was solved, proving it is possible to share insights without showing the underlying data. Fortunately, modern methods do not require arduous manual calculations.

“Sharing without showing? You bet!”

Increased demand for data sharing

For potential advertisers or anyone who wants to collaborate with data, that’s great news. Data sharing and collaboration deliver business value. A recent Capgemini study, Data sharing masters, found that companies with collaborative data ecosystems reported better business outcomes including new revenues, reduced costs, increased productivity, and greater customer satisfaction. And that promise has spurred new data ecosystem initiatives.

Companies have long used their own data to better understand their customers or to improve operations. Increasingly, data teams turn to external data sources to enrich their internal data and enhance analytics. Budgets for external data are significant and growing. In a recent survey conducted by external data platform Explorium, 22 percent of respondents said they were spending more than $500,000 on external data, with 13 percent saying they spent more than $1 million (up from 7 percent from a similar survey in 2021).

Customer data was the number one type of data acquisition: 52 percent purchased data on companies, followed by 44 percent purchasing demographic data. And the number of sources has grown as well: 44 percent of firms acquire external data from five or more providers. That’s up from only 9 percent the previous year. However, procuring external data is not without challenges, with regulatory constraints often topping the list. Concerns about GDPR or other privacy regulations loom large, and for good reason.

Introducing modern data clean rooms

Not long ago, data sharing meant copying and sending files to a partner. That practice certainly complicated data governance. Short of a manual audit, knowing who accessed the data and for what purpose was impossible. Now, using the principles demonstrated by Yao’s millionaires, two or more parties can derive insights from data without revealing the underlying information.

With a Snowflake Global Data Clean Room, each party controls its own data, allowing governed, controlled analytics by other parties. That is to say, each party specifies who can access the data and for what purpose. Let’s take a look at how it would work with Yao’s two millionaires, Alice and Bob.

First, each party creates a table with the data to be shared. Then one party, let’s say Bob, creates a table to store allowed statements. This is where the queries that Bob will allow another party to run against his data will be maintained. He then creates an access policy granting use of these statements and applies this access policy to his data table.

Next, Bob defines the exact statement or query he will allow, and inserts it into his “allowed statements” table. The statement includes the comparison of their wealth and the answers that will be returned in each case: “Bob is richer,” “Alice is richer,” or “Neither is richer.” Finally, he grants Alice permission to access and use his data for only this specific purpose. Alice then asks the question in the form of the specified query and receives the response: Bob is richer. Sorry, Alice.

Now imagine a more realistic business scenario where two companies want to know which customers they have in common – an overlap analysis. They would put the data in tables, establish the statements to compare their customer lists, and specify the information to be returned. Or one company might be interested in finding new prospects among a partner’s customers and would perform a look-alike analysis comparing customer attributes.

Data clean rooms transform the ad world

In a real use case, commonly seen in media and advertising these days, brands want to optimize their ad spend through better targeting to specific customers or personas – like the fans watching that exciting game. Media outlets want to offer premium placements by knowing exactly which programming the brand’s customers are watching. Comparing customers is a win-win. However, neither wants to show the underlying data. The clean room allows them to share without showing. In this case, as illustrated in the diagram, the returned information would include a customer count for each of the media outlet’s programs, but not specific customer data, in order to ensure compliance with privacy regulations. All queries of the data would be monitored and logged for audit purposes.

In the past, this scenario required data to be copied and moved across the AdTech value chain from enrichment to activation to attribution. Not only were there the aforementioned governance concerns, but that data was also immediately stale. With Snowflake, live, near real-time data can be shared where it resides – no copies necessary. Data governance capabilities allow all parties to assign access and use policies that limit both who can query the data and exactly which queries are allowed. Additional capabilities add further security to the clean room. Data can be encrypted, anonymized, tokenized, or pseudonymized with built-in hashing functions, or obfuscated with data masking or by injecting differential privacy.

With today’s technology, data clean rooms allow parties across teams, companies, government agencies, and international organizations to collaborate and securely share sensitive or regulated data. As Thomas Edison said, “The value of an idea lies in the use of it.” The more data is used, the more value is created. Secure data collaboration accelerates value creation.

INNOVATION TAKEAWAYS

CROSS – INDUSTRY COLLABORATION AND DATA SHARING

A growing trend that’s here to stay.

DATA CLEAN ROOMS FACILITATE JOINT DATA ANALYSIS AND ML

While ensuring that confidential information will stay protected

from sharing partners.

DATA ECOSYSTEMS AND SECURE DATA COLLABORATION

They accelerate value creation.

Interesting read?

Capgemini’s Innovation publication, Data-powered Innovation Review | Wave 6 features 19 such fascinating articles, crafted by leading experts from Capgemini, and key technology partners like Google, Starburst, Microsoft, Snowflake and Databricks. Learn about generative AI, collaborative data ecosystems, and an exploration of how data an AI can enable the biodiversity of urban forests. Find all previous Waves here.

Jennifer Belissent joined Snowflake as Principal Data Strategist in 2021. Prior to joining Snowflake Jennifer spent 12 years at Forrester Research as an internationally recognized expert in data sharing and the data economy, data leadership and literacy, and best practices in building world-class data organizations. At Snowflake, Jennifer helps customers develop Data Cloud strategies that facilitate data access and deliver business value. Jennifer earned a Ph.D. and an M.A. in political science from Stanford University and a B.A. in econometrics from the University of Virginia.