Conversational twins
_{The virtual engineering assistants of the (near) future}

David Granger

May 15, 2025

What will happen when Gen AI meets VR meets Digital Twins meets high-powered chips? Enter the ‘Conversational Twin’ – a virtual, 3D, generative AI assistant that can visually guide you through complex tasks.

Imagine your car breaks down and, to save a bit of money, you decide to fix it yourself.

You head to your garage with your smartphone and start looking up YouTube tutorials. Eventually, you find one that covers your problem and start watching. As you get to the key part, you start fiddling around with the engine. The presenter is explaining much faster than you can act, so you keep going back to your phone to scroll back 20 seconds and rewatch. After watching the key bit several times, you find the problem. A new part is needed. You spend an hour online trying to find the right one, amongst 100 identical looking options with names like ‘CC01-15/06’ and ‘CC01-15/06e’. A few days later, it arrives and it’s back to the garage. Another hour fixing and scrolling, and your car is finally ready to go.

For all their flaws, the popularity of online tutorials shows the enormous demand for information on how to fix things, and, perhaps, a deeper need to feel in control. And that’s just private citizens. Mechanics and engineers have an even greater need to access vast amounts of information on a vast range of processes and parts, and how to apply them to different models of cars, aircraft, machine tools, etc.

YouTube is certainly better than thick, boring instruction manuals. But really, people want to interact in natural human ways. They process information in different ways, and have different starting knowledge, making start-to-finish tutorials an inefficient way to deliver information. In an ideal world, you would have someone nearby who understands the problem, and can explain what you should do as you go, and who can answer questions if you didn’t understand the instructions.

Could digitally delivered instructions become more like that human expert? We think so. Particularly due to advances in generative AI, virtual reality, digital twins, and advanced chips.

The instruction manual of the future

We can imagine, in the not-too-distant future, that same smartphone could contain an app with a digital twin of the car, that has been trained on the car’s instruction manuals (we’ll stick with the car analogy, but this could be applied to any complex engineered product).

The result would be that, when you arrive at a problem that needs fixing, you open the app and verbally describe the problem to a virtual assistant. The app then generates a visual step-by-step guide to solve the problem, which can be communicated via a mix of AR overlays, demonstrations by avatars, and spoken instructions, through your phone, tablet or VR headset.

Rather than simply following a series of steps, it would leverage generative AI to contextualize your challenge and explain exactly what you need to do to fix it, using 3D visualizations, and adjusting them according to the verbal questions you ask it, then waiting patiently for you to finish one task, and ask for the next instruction (or to clarify the last one). We call this a Conversational Twin, because you are effectively conversing with a digital twin of the car, which knows everything about it.

By harnessing the phone camera, the app could even watch your movements and guide you in real-time (“unscrew the cap, no not that one, the one 10 cm to your left”) by comparing the video feed to its internal model of the vehicle. When you reach the problem, you could hold up the broken part and the Twin would recognize it and order you a new one.

Such a Conversational Twin will significantly benefit many people who want to fix things themselves. But its real value will be as a huge cost saver to companies with large maintenance and engineering teams, allowing those people to access much more expertise, and thereby enabling smaller teams to perform more tasks, more quickly, even if they’ve never seen the problem before.

How to do it

Technically, most of what is described above could be created today. But it would be a lot of work. Each product would need to be carefully mapped and digitized, conversational flows would need to be carefully scripted and programmed, and a library of animations would need to be pre-designed.

Generative AI is rapidly changing the game here. Already, dedicated AI models can be trained on information from manuals to YouTube videos to online trade forums, so they can find answers as they are requested, and return them as contextualized text or spoken instructions.

The more challenging part is mapping those text-based instructions onto 3D models of the product. The Conversational Twin would need to interpret a mix of text and visual inputs, turn them all into prompts for itself, find the answers, match those text instructions onto its internal 3D model of the specific car, then overlay its responses as 3D objects onto the physical car it sees via the camera. We are not quite there yet.

But such technology is coming. Virtual and augmented reality have come on leaps and bounds in the past few years, and it is only a matter of time before virtual objects can be generated in response to generative AI instructions. Equally, today’s large language models (LLMs) deal with text, but they will need to output machine-readable instructions in order to generate virtual overlays. That is not something LLMs do yet, but bright minds – including those at Capgemini – are working on making that connection between LLMs and Real-time 3D engines. Once these two areas advance a little further, it is a matter of carefully connecting everything.

Of course, generative AI is not a ‘magic bullet’ that can just be told what to do and automatically produce the result you want. It will need a well-defined architecture and effective rules for how to ‘prompt’ it to generate the right responses, outputted in ways that can be reliably converted into 3D visuals.

Finally, we still need some microchip advancements to deliver all this on a device. Today, we use edge computing devices and the cloud to process these advanced workloads, and indeed much can be done using these approaches that will lay the foundations for Conversational Twins. But we suspect that in the next few years, chips will be sufficiently more advanced to do all the processing on a smartphone, tablet, or VR headset.

What to do to get ready for Conversational Twins

Even if Conversational Twins are a few years away, there is a lot that companies can do now to prepare for them, which will also have immediate value elsewhere.

The first is investing in Real-time 3D. This is a rapidly growing technology with exciting possibilities, like the ability to showcase products to customers without them leaving their homes or create virtual working environments that can train employees without risk.

A related point is to start preparing existing assets for training Gen AI and building 3D assets. Many companies already have 3D product models, rendered marketing materials, and so on. But they are often held in silos and can be of inconsistent quality and formats. Complex projects like Conversational Twins will not be reliable if the underlying 3D model of the product – on which they base their recommendations – does not match the real product.

Those that have not already done so, should create centralized virtual models of their products and businesses, as a single source of truth. That way, anyone in the company producing 3D materials – whether for new product design, marketing, or building Gen AI-powered assistants – is working from the same high-quality version. In time, this ‘virtual twin’ will provide the digital foundation for your Conversational Twin.

Why you should start now

Once the above comes to fruition, companies making products like cars or planes could offer a corresponding app that guides users on how to maintain and fix them. That could be sold as a subscription to professional mechanics, maintenance engineers and training organizations, and made available free or for a fee to people who have bought the product, as a differentiator from their competition.

Many aerospace and industrial companies are already exploring how to simplify the maintenance, training, and configuration of products – rather than relying on complicated documents or fixed training modules. As engineering companies move from selling products, to managing the entire lifecycle, Conversational Twins can provide customers with added value that can save them time and money, extend the life of products, and provide a valuable source of data on how to improve future designs.

If we start getting our data and models ready now, and embarking upon proof-of-concepts, Conversational Twins could be with us this decade.guide organizations to integrate AI carefully, following sensible adaption and risk management frameworks and deploying appropriate training, ensuring both its potential and limitations are carefully navigated.

Discover the next generation of user experiences powered by real-time 3D. Click to learn more about Capgemini Engineering’s Real-time 3D solutions.

Check out our blog series

Applications

Should we use generative AI for embedded and safety software development?

Vivien Leger

May 13, 2025

Applications

Boosting productivity in software engineering with generative AI

Jiani Zhang

Apr 16, 2025

Engineering

From pilots to production: Overcoming challenges to generative AI adoption across the software engineering lifecycle

Keith Glendon

Apr 24, 2025

Gen AI in software

Report from the Capgemini Research Institute

Meet the author

David and his expert team lead the development of advanced solutions that integrate real-time 3D (RT3D) visualization with generative AI to drive innovation across industries, known as ‘Experience Engineering’. His team specializes in crafting intelligent experiences that reshape how businesses engage with digital content.