Skip to Content

Code to form: The rise of AI robotics and physical AI 

Dheeren Velu
Dheeren Vélu, Kary Bheemaiah and Alexandre Embry
Jun 19, 2025

Earlier this year, we highlighted how AI-driven robotics is reshaping the boundaries between humans and machines—ushering into a new era of intelligent, adaptive systems. This evolution is now taking shape through the rise of physical AI. In this blog, we’ll showcase the core components of physical AI and explore how business leaders can take advantage of this transformative shift. 

The last few years have been a boon for AI. We now have foundational models for generating high-quality and reliable text, images, and video. We are also starting to benefit from agentic AI systems that can operate with autonomy, undertake goal-directed behavior, and adapt their behavior based on changing circumstances.  

But what about Robotics? How have we helped robots act more like humans? The answer is that we have made tentative progress. We have systems that enable robots to handle specific tasks within constrained environments.  

However, asking robots to go beyond these confines and act more like us is problematic. The world is built for humans, and changing the environment to suit a new age of robotics would be too expensive. Enabling robots to become more like us requires a nuanced approach – physical AI, where AI is integrated into physical systems for environmental interaction. 

Creating a foundational model for robotics means enabling machines to adapt to our world. We must visualize and codify the millions of tasks we perform daily and use this data to help robots perform tasks collaboratively.  

This is no straightforward task. The journey to a foundational model for robotics will involve many hurdles. Progress will rely on contributions from multiple parties. Let’s consider how your business can get involved and how advances in AI and edge computing can help us build foundational models that support the long-term emergence of physical AI.  

A new state of urgency around robotics 

Robotics is on the verge of a major leap—much like the recent revolution in generative AI. The hardware components, batteries, and actuators, which cost up to 50% of the actual bill of materials, have fallen by double-digit numbers over the past few years.  

This cost reduction means innovations that might have taken up the better part of a century will be completed in a decade or less. We are moving to a future where robots can work alongside humans, bridging the digital and physical worlds. With advanced sensory integration, deep reasoning, and dexterous manipulation, these robots will adapt to their environment, moving quickly and effectively.  

Agile companies in certain regions, particularly China, are already optimizing their supply chains and iterating on robotic innovations. The companies that create value from physical AI will win the race. So, how can your business stay competitive and thrive in this new era? 

Towards general-purpose robotics 

Business leaders must recognize that AI and humanoids are coming together to move robotics beyond structured manipulation and autonomous navigation to general-purpose robotics, where navigation and manipulation seamlessly integrate. 

This co-evolution of AI and humanoids should be your guide to effective competition in the race towards physical AI. AI, rather than hardware, will be the key differentiator that powers the move to general-purpose robotics, and its foundational model will draw on three main engines: 

  • Reinforcement Learning – Rewarding robots for achieving a set business goal and not just completing a discrete task. 
  • Agents – Using pioneering work on the industrial metaverse and digital twins as the source data to create the simulations that build the foundational model for robotics. 
  • Vision Language Action – Combining vision models that detect an object, scene, or visual input with neural networks trained on thousands of actions. 

The huge amount of training in Vision-Language-Action (VLAs) enables robots to perform actions without programming. They can work on the fly to complete actions they might not have previously encountered. These pre-trained robots can collaborate, with their foundational models interacting to complete tasks effectively. 

Asking the right questions about Physical AI 

As we stated at the outset, the journey towards humanoids is challenging. General-purpose robotics helps us to visualize how a foundation model for robotics is a realistic end objective, but we can’t be complacent. 

Let’s take a step back and think about how humans operate: some things we do intuitively, like recognizing a friend’s face in a crowd, and some things require a more deliberate approach, such as solving a math problem. The same process is true with robotics in an age of physical AI: some systems will need fast, reactive responses and can be handled by transformer models; other systems are more analytical and will require huge computer power and pre-trained VLAs to interpret complex scenarios and plan accordingly. 

Cost will be a crucial concern. The foundational models that support generative and agentic AI often rely on the cloud. Pushing processing requirements to the cloud creates huge costs. These costs will be prohibitive in an age of Physical AI, where robots will need to connect to supporting computer systems rapidly and repeatedly. The key to handling these ever-growing computational requirements is edge computing. The Edge will help your business manage data demands cost-effectively by moving compute to where the data resides, reducing the requirement for expensive cloud services, and minimizing data transmission costs.  

At Capgemini, we’re not here to give you all the answers. We’re here to help you ask the right questions, because that’s when you get creative, unlock inspirational ideas, and generate new value.  

As we enter the era of physical AI, we can help you explore the possibilities, challenge assumptions, and shape the future of robotics—together. 

“We must start thinking about how we’ll make the foundational model for robotics. Nobody knows exactly when it will happen, but if we think about AI and the progress during the past three years, I think the call for action starts becoming more and more important.”  – Kary Bheemaiah, CTIO Capgemini Invent 

“VLA is the core of the foundational models powering these humanoids. The world is full of actions, and VLA is just a coming together of all this data. It’s an ingenious idea where now you can look at an object and drive an action without programming.”  – Dheeren Velu, Head of Innovation & AI, AUNZ 

“At the intersection of AI, robotics, and computer vision, advanced training methods like reinforcement learning and VLA are driving a profound transformation: automation endowed with arms, legs – and reasoning. No programming is required.  But the true breakthrough isn’t just technological – it’s collaborative. Humans, humanoid robots, and virtual AI agents are about to operate as cohesive teams, elevating efficiency, precision, and safety to unprecedented levels.” – Alexandre Embry, CTIO, Head of the Capgemini AI Robotics and Experiences Lab  

Navigating the new industrial landscape – Capgemini  

Meet the authors

Dheeren Vélu

Web3 & NFT stream Lead of Capgemini Metaverse Lab
Dheeren is the Web3 & NFT Lead at Capgemini Metaverse Lab. As an innovation expert and Web3 strategist, he has an extensive background in artificial intelligence and is focused on developing novel concepts and business models in the emerging Web3 and metaverse areas, bringing to life innovative solutions and products, underpinned by the decentralized capabilities like smart contracts, tokens and NFT techniques.

Kary Bheemaiah

Chief Technology and Innovation Officer (CTIO), Capgemini Invent
Kary is Chief Technology and Innovation Officer at Capgemini Invent where he helps to define the business and sectorial implications of emerging technologies that are of strategic importance to Capgemini’s clients. Kary informs go-to-market implementation plans, aids in the creation of new client-specific solutions & services and contributes to thought leadership on innovative technologies. He further connects the world of emerging technology to all Capgemini Invent practices, clients, and sectors.

Alexandre Embry

CTIO, Head of AI Robotics and Experiences Lab
Alexandre Embry is CTIO, member of the Capgemini Technology, Innovation and Ventures Council. He is leading the Immersive Technologies domain, looking at trends analysis and developing the deployment strategy at Group level. He specializes in exploring and advising organizations on emerging tech trends and their transformative powers. He is passionate about enhancing the user experience and he is identifying how Metaverse, Web3, NFT and Blockchain technologies, AR/VR/MR can advance brands and companies with enhanced customer or employee experiences. He is the founder and head of the Capgemini’s Metaverse-Lab, and of the Capgemini Andy3D immersive remote collaboration solution.