InsightGrid: Engineering the AI Data Platform for Speed, Semantics, and Sustainability

Rajesh Iyer

March 13, 2026

For Speed, Semantics, and Sustainability

The demands of modern AI, especially generative and agentic AI, couldn’t be higher. These workloads thrive on unstructured data, dynamic token flows, embedded meaning, and real-time context. Trying to force that through yesterday’s architecture is like trying to do real-time translation with a fax machine.

What’s needed isn’t another incremental feature it’s a reimagination of the data lakehouse for the Agentic AI era. One that treats compute, storage, and networking as first-class, GPU-native, and semantically rich from the start.A lakehouse capable of handling structured data and multi-modal unstructured content, batch and real-time flows, and semantic retrieval in a single fabric.

InsightGrid: The Clarifying Voice in AI Data Platforms

While many platforms claim to be “AI-ready,” they often bolt generative AI features onto legacy CPU-centric architectures without rethinking data flow, execution semantics, or cost. Capgemini developed InsightGrid, accelerated by NVIDIA technologies and development frameworks, as a GPU-native reimagination of the data lakehouse. This lakehouse was engineered to unify structured and unstructured data, including multi-modal content across batch and real-time enterprise data pipelines.All without fragmenting storage, compute, or governance across separate systems.

To appreciate what this architecture unlocks, consider three converging realities facing modern financial enterprises.

Capital markets firms must meet regulatory mandates such as T+1 settlement, requiring near-instant reconciliation across post-trade data flows.
Banks must intercept fraud, validate payment instructions, and reduce abandonment in onboarding flows using both structured signals and unstructured evidence often in real time.
Insurers face similar urgency, where delays in quoting, claims follow-up, or service response lead to silent attrition.

Whether signals arrive as streams, files, documents, transcripts, images, or media, the margin for delay is gone. What’s needed is not just faster data, but fused, validated, semantically aligned intelligence at the moment of arrival.

Benchmark testing has shown InsightGrid pipelines achieving 5–7× performance improvements over comparable CPU-based stacks, while reducing infrastructure costs by 60–80%. Storage and compute are disaggregated, supporting cloud and hybrid deployments without sacrificing throughput, latency, or observability.

A Lakehouse Built for Semantics, Not Silos

InsightGrid is designed to replace the fragmented enterprise pattern where:

· structured data lives in tables,

· unstructured content lives in object stores,

· embeddings live in vector databases,

· and reconciliation happens downstream.

Instead, InsightGrid treats records, events, tokens, vectors, and media as first-class citizens within a single GPU-native lakehouse fabric.

InsightGrid Integrates

NVIDIA RAPIDS, Polars, and Ray software for distributed, in-memory GPU computation across ingestion, transformation, analytics, and MLeliminating JVM overhead.
Amazon FSx for Lustre, paired with NVIDIA Magnum IO™ GPUDirect Storage (GDS), enabling zero-copy access to data directly from GPU memory.
Apache Iceberg as the transactional lakehouse substrate, providing ACID guarantees, time travel, and schema evolution across all grids.
DuckDB and in-stream metadata handlers for near-real-time metadata propagation, observability, and lineage without CPU bottlenecks.
High-performance Amazon EC2 P6fe UltraServers and P6 instances with Elastic Fabric Adapter (EFA) for multi-node scale, and NVIDIA NVLink/NVLink Switch for optimized single-node multi-GPU communication.

This is not incremental modernization. It is a foundational reset of the lakehouse for real-time, semantic, GPU-first execution.

The Four Grids of InsightGrid

#1. SentinelGrid: Trust at Ingest

InsightGrid enforces data integrity at the moment of arrival. SentinelGrid applies schema validation, quality rules, and ACID constraints using Iceberg and GPU-native operators. Non-compliant records are quarantined immediately, preserving downstream trust without retroactive remediation.

Trust is not inferred later, it is established upfront.

#2. ConcordGrid: Structured + Embedded Fusion (Without Joins)

Unstructured data(text, PDFs, images, audio, and video)is processed through GPU-accelerated pipelines and embedded into a single joint embedding space, allowing semantic comparison across modalities.

The Unified Vector Space

A document, a photo, a phone call, different signals, same underlying reality. A unified vector space converts them into one mathematical language so an agent can compare, combine, and reason across all three simultaneously.
Risk hides in the gap between modalities. Close the gap and you see it. Keep the systems separated and you never do.
This matters most for agentic AI because an agent makes a chain of decisions, each conditioned on everything before it. Lose a modality mid‑chain and the chain breaks. The unified space is what keeps the agent coherent step to step, signal to signal.

Banking: KYC, scanned ID, proof of address, video onboarding call. Processed separately, a synthetic identity sails through. Processed together, the agent flags that the face doesn’t match the document age, the address is a mail drop, and the script was read, not spoken.
Three innocuous signals. One damning intersection.

At embedding time, each artifact is indexed using enterprise identifiers such as customer ID, policy number, transaction ID, timestamp, channel, and jurisdiction following the reference-indexing model used in enterprise content management (ECM) systems.

These indices do not create relational joins.
They provide deterministic referential alignment between embedded content and structured records while preserving semantic independence.

Structured attributes act as governance anchors, while embeddings carry meaning. Together, they enable:

· semantic search across all media types,

· cross-modal retrieval

Without vector silos, secondary databases, or SQL execution semantics.s.

#3. SignalGrid: Ad Hoc Analytics and AI Workbench

SignalGrid serves as the lakehouse interaction layer for analysts and data scientists. It supports GPU-native feature pipelines, exploratory analytics, and real-time dashboards that combine structured metrics with semantically indexed content.

Users can navigate enterprise data for information, insight, and inference without being constrained by batch windows or pre-modeled aggregates.

#4. PulseGrid: KPI Cohort Monitoring and Decomposition

PulseGrid models business telemetry as tensors, not tables. Powered by 𝝉DB, it enables real-time decomposition of KPIs and OKRs across cohorts, segments, and time without pre-aggregated cubes or static dashboards.

Tensors — The Shape of Reality

Traditional ML sees a customer as a point smoothed, averaged, approximated against its neighbors. A tensor treats every entity as exactly what it is: a specific customer, a specific policy, a specific period. Not an average. Not a midpoint. A precise coordinate where modes intersect and meaning lives at that crossing.

For agentic AI this isn’t a modeling preference, it’s a structural requirement. Every decision the agent makes becomes the ground its next decision stands on. Blur the intersection and the agent starts its next step from the wrong place. Errors don’t cancel. They stack.

Insurance: A new homeowner. Dwelling coverage, no contents January. Three unremarkable facts until the agent intersects them: first winter, uninsured household, the month American pipes freeze. It reaches out in December and gets the contents covered. The flat model waits for the flood.

This allows organizations to understand which segments are driving observed outcomes, how strongly they contribute, and how those drivers evolve over time.

PulseGrid was demonstrated live at the NVIDIA booth during AWS re:Invent 2025, showcasing real-time KPI decomposition over production-grade GPU infrastructure.

How Data Moves

InsightGrid uses Apache Iceberg over Amazon S3 as the transactional lakehouse substrate, synchronized via Data Repository Association (DRA) with Amazon FSx for Lustre as GPU-native scratch space optimized for GDS.

All transformations execute eagerly using NVIDIA Polars library and RAPIDS SDK, operating directly in GPU memory. Metadata—such as freshness, cardinality, and lineage is propagated in near real time using DuckDB and custom in-stream handlers.

Unstructured content flows through a dedicated embedding pipeline, producing embeddings within a joint semantic space. These embeddings are aligned to structured enterprise context using shared indices—not relational execution plans—enabling governed semantic analytics across modalities.

“InsightGrid is not a patchwork of tools. It is an engineered system, designed to run real production workloads at scale.”

Benchmarks that Matter

Performance

Multi-billion-row transformations execute several times faster than JVM-based pipelines.
Semantic retrieval scales linearly with GPU capacity.

Cost Efficiency

ETL and analytics costs are reduced by 2–4× versus CPU-centric stacks.
Storage consolidation eliminates duplication across batch, stream, and unstructured silos.

Carbon Impact

GPU-native execution reduces energy per workload.
Shared GPU infrastructure for data and AI lowers overall enterprise carbon footprint.

Capgemini: Delivering the AI Factory

Capgemini is building InsightGrid in collaboration with AWS and NVIDIA, with support from the AWS Generative AI Innovation Center (GenAIIC) Partner Innovation Alliance, integrating it into enterprise transformation programs as the foundation for the AI Factory.

Our role includes platform blueprinting, infrastructure-as-code deployment, GPU cluster optimization, and integrated governance and lineage.

This is not a vision.

It is running.

Final Word

Let’s be honest: the data stack most banks and insurers run today can barely power last year’s dashboards let alone tomorrow’s AI systems.

Warehouses weren’t built for embeddings.
SANs weren’t built for GPUs.
CPU pipelines weren’t built for real-time semantics.

InsightGrid isn’t a tweak.
It’s an engineering reset of the data lakehouse rebuilt for GPUs, real-time execution, and semantic alignment.

No SANs.
No JVMs.
No architectural debt.

Just GPU-native infrastructure designed for speed, meaning, and sustainability.

Next steps

We will present this as a demo and as a speaking session at NVIDIA GTC 2026: The Agentic AI Data Factory: Why Agents Need a GPU-Native Data Platform to Create Real Value. Presented on Monday, March 16th at 2:00 pm.

AUTHORS

Rajesh is the Global Head of AI and ML for Financial Services. He has almost three decades of of experience in the Financial Services Industry, working with Fortune/Global 500 clients seeking to maximize the value of investments in their Enterprise Data and AI programs.