The real data advantage in financial services is still locked. See how modern architectures are turning compliance into a competitive edge.

Let’s start with a fact that should keep every CMO in financial services up at night (celebrating or commiserating): you almost certainly sit on the richest seam of first-party customer data of any industry on earth. Richer than retail. Richer than media. Richer than travel.

Think about what you actually know about your customers. Transaction patterns, credit behavior, account hierarchies, product holdings, life-stage signals, household relationships, risk scores, payment history — the lot. Where a retailer knows what someone bought last Tuesday, you know how they’ve managed their money for the last decade. That’s not a marginal advantage. That’s a different game entirely.

And yet, in practice, financial services institutions (FSIs) consistently lag behind retail, travel, and media when it comes to personalized marketing at scale. The data is there. The ambition is there. The results, far too often, aren’t.

Why not?

The Compliance wall

The problem isn’t the data. It’s what you’re allowed to do with it — and where.

Traditional marketing activation requires moving data. You pull customer records, enrich them with behavioural signals (clickstream, web events, app interactions), stitch them together into meaningful segments, and push them out to media channels and engagement platforms. In most industries, that’s a relatively straightforward data engineering exercise.

In financial services, it’s a compliance minefield. Moving sensitive customer data — transaction history, credit signals, account relationships — outside your secure environment would trigger a cascade of regulatory review, legal sign-off, and data governance scrutiny that can take months to resolve. And quite often, the answer at the end of it is simply: you can’t do it.

So FSIs end up in an uncomfortable position. They have extraordinary first-party data they’re not meaningfully activating. They’re running campaigns off surface-level behavioural signals — email opens, web clicks — whilst their most valuable customer intelligence sits locked inside their data warehouse, inaccessible to marketing. The result? Campaigns that feel generic, targeting that misses, and personalisation that doesn’t really personalise.

Meanwhile, neobanks, fintechs, and digital-first challengers are eating into market share by moving faster with less data but better tooling. The competitive gap is widening.

The answer is already in your infrastructure

Here’s the good news: the solution doesn’t require exporting your data anywhere. It requires doing the intelligence work where the data already lives — inside your own environment.

The approach looks like this. You resolve your identity graph directly on your cloud data warehouse — whether that’s Databricks, Snowflake, BigQuery, or similar. You do all your audience building, segmentation, enrichment, and decisioning in that governed environment. Then, and only then, you output the absolute minimum required to activate on external channels: a hashed, pseudonymised identifier with no PII attached. The kind of signal that says ‘this person, right now, needs to see this’ — without the channel knowing anything about who that person actually is.

No data duplication. No third-party holding your customers’ sensitive information. No compliance trigger. Your data never leaves your perimeter.

This is sometimes called zero-copy activation, and it’s the architectural unlock that FSIs have been waiting for.

Enter the Composable CDP

The technology category that makes this possible is the composable CDP — and it’s probably the hottest conversation in MarTech right now.

A composable CDP works fundamentally differently from the traditional “packaged” CDP most organisations evaluated five or ten years ago. Traditional CDPs ingested your data into their own cloud environment, gave you segmentation tools, and activated from there. For FSIs, that model was broken from day one: duplicating regulated customer data into a vendor’s platform is simply not a viable option.

A composable CDP flips the model. Instead of pulling your data out, it operates natively inside your existing data infrastructure — sitting on top of your warehouse and activating directly from it. Your data doesn’t move. The CDP is an activation and intelligence layer, not a competing data store. Four things define a truly composable architecture:

  • Warehouse-native: your data warehouse remains the single source of truth
  • Zero-copy: data is never duplicated into a vendor environment
  • Schema-agnostic: works with your actual data model, not a rigid vendor template
  • Governed by design: compliance filters, consent rules, and access controls are enforced at the data layer before any marketer touches a segment

That last point is particularly significant for FSIs. Protected-class restrictions, suppression lists, and permitted data use policies are baked in automatically — not bolted on afterwards.

The market has taken notice. Composable, warehouse-native CDP vendors grew headcount 7.8% in the second half of 2025 — nearly six times the 1.3% industry average. More than a quarter of CDPs now support a warehouse-centric architecture. This isn’t a niche trend; it’s a structural shift in how enterprise marketing infrastructure is being built.

Three vendors are leading the composable charge. Hightouch — which recently raised £150M in a Series D at a $2.75 billion valuation, co-led by Goldman Sachs and Bain Capital Ventures — is the most established pure-play, known for its reverse ETL capabilities and deep warehouse integrations across Snowflake, Databricks, and BigQuery. Treasure Data brings enterprise-grade composable capabilities with native journey orchestration and AI decisioning built in. And Amperity has built a strong reputation specifically around identity resolution, with patented ML-based matching that’s particularly relevant in complex, fragmented data environments.

The fact that Databricks itself entered the market last week with CustomerLake — a fully native agentic CDP embedded directly in the lakehouse — tells you everything about where the industry is heading. When the warehouse platform launches its own CDP, the composable thesis has been well and truly validated.

The Identity Graph: the real unlock

At the heart of making all of this work is the identity graph — and it’s worth spending a moment on what that actually means, because it’s where the real value for FSIs lies.

An identity graph is the process of connecting customer signals across systems, devices, products, and channels into a single unified profile. In financial services, that means stitching together a customer’s current account, their mortgage, their credit card, their ISA, their wealth management relationship, their mobile app sessions, and their branch interactions into one coherent view of who that person is and what they need.

Without that layer, most FSIs are marketing to isolated product holders rather than complete customer relationships. Your credit card team doesn’t know that customer also holds a mortgage with you. Your wealth management team markets independently of retail banking. Suppression, personalisation, and measurement are all fragmented as a result.

The conventional approach to identity resolution — sending customer data to a third-party graph vendor for matching and enrichment — is a non-starter for FSIs, for the same compliance reasons discussed above. The moment that data leaves your walls, the governance cascade begins.

The composable approach solves this elegantly. Identity resolution runs directly inside your own cloud environment — within your Virtual Private Cloud, governed by your own security certifications and access controls. The matching logic, graph rules, and resolved identity spine all remain behind your perimeter. Your data science team builds, iterates, and improves matching quality using the full depth of signals available in the warehouse, including signals that would never be permissible to share externally.

The result is an identity graph that is fully auditable, inspectable, and governed by your rules — not a vendor’s proprietary methodology. And it unlocks capabilities that simply weren’t accessible before: cross-line-of-business profile assembly, governance enforced at the identity layer itself, and durable measurement and attribution anchored to a stable first-party identity rather than depreciating third-party cookies.

A concrete example: motor insurance renewal

Let’s make this tangible with an insurance use case, because it’s one of the strongest illustrations of what this architecture actually unlocks.

A motor insurer holds genuinely rich first-party data on every policyholder: vehicle details, claims history, payment behaviour, years as a customer, household composition, any additional policies held. That data is extraordinarily valuable for predicting renewal risk — who is likely to lapse, who is price-sensitive, who is actively shopping around.

Now layer in clickstream data: which customers have visited a price comparison website in the past fortnight. Which ones have browsed competitor landing pages. Which ones have started but not completed a quote journey on your own site. These are real-time intent signals that, combined with the rich first-party policy data, create an incredibly precise picture of renewal risk at the individual level.

In a traditional setup, you can’t combine these two data sets inside a compliant marketing activation workflow. The clickstream data sits in your analytics environment; the policy data sits in your core systems; and getting them into a traditional CDP together means moving sensitive PII outside your secure perimeter.

With a composable CDP and an in-warehouse identity graph, the whole thing happens inside your environment. You identify the high-risk renewal cohort — customers with three or more comparison site visits in the past fourteen days, combined with price sensitivity signals from their payment history and a sub-three-year tenure. You build the audience segment entirely within your warehouse. You then output only a hashed identifier — no name, no postcode, no policy number — to paid social and programmatic channels. That customer sees a precisely timed retention offer three days before their renewal date.

No PII left your environment. No compliance breach. A campaign that would have been impossible six months ago is now running in production.

The remaining challenges

It would be dishonest to present this as a solved problem, because the architecture only works if the underlying conditions are met — and for most FSIs, getting there requires genuine organisational change.

The first challenge is data connectivity. A composable CDP is only as powerful as the data you can bring into the warehouse. That means every relevant first-party data source — core banking systems, CRM platforms, servicing tools, mobile apps, call centre data, branch interactions — needs to be connected via robust, well-maintained reverse ETL pipelines and APIs. In practice, many FSIs have fragmented data estates with significant legacy infrastructure, and the work of getting everything into the warehouse in a usable, timely state is substantial.

The second challenge is internal governance and cross-team friction. The composable model requires marketing, data engineering, compliance, and IT to work from a shared data layer under shared rules. That’s straightforward in principle and genuinely difficult in practice at large institutions, where data ownership is contested, cross-team data sharing requires formal approval processes, and the compliance function operates as a gate rather than a partner. Getting the organisational model right — building the processes, the trust, and the shared language between teams — is at least as hard as getting the technology right.

Neither of these challenges is insurmountable. But they require a clear strategy, executive sponsorship, and the willingness to treat data infrastructure as a strategic marketing asset rather than an IT cost centre.

Where Capgemini FS is taking this next

At Capgemini, we’re working at the frontier of exactly this problem. Specifically, we’re applying Agentic AI to the identity graph layer itself — using autonomous agents to continuously improve matching quality, surface cross-line-of-business connections that rule-based systems miss, and accelerate the resolution process without requiring constant manual intervention from data engineering teams.

The identity graph is the hardest part of the composable stack to get right at scale. Deterministic matching — matching on exact identifiers like email address or account number — is reliable but incomplete. Probabilistic matching — inferring connections from behavioural and contextual signals — is powerful but requires ongoing model maintenance. Agentic AI sits in the middle: continuously learning, adapting, and improving resolution quality in a way that neither pure-deterministic nor static-probabilistic approaches can achieve.

We believe this is where the meaningful competitive differentiation in FSI marketing will be won over the next few years — not in the activation layer, but in the quality and depth of the identity foundation underneath it.

Now is the time to act

The data goldmine has always been there. The compliance constraints are real but not insurmountable. The architecture to work within them — composable CDP, in-warehouse identity resolution, zero-copy activation — is mature, proven, and increasingly accessible.

The FSIs that move now will spend the next few years building compounding advantage: richer identity graphs, more precise targeting, better measurement, and campaigns that actually feel personal to customers who have come to expect the opposite from their financial providers.

The opportunity to finally activate the whole rich data set you’ve been sitting on? It’s here. The question is: are you ready to harness it?