Skip to Content

Auditing ChatGPT – Part I

Grégoire Martinon, Aymen Mejri, Hadrien Strichard, Alex Marandon, Hao Li
Jan 12, 2024

A Chorus of Disruption: From Cave Paintings to Large Language Models

Since its release in November 2022, ChatGPT has revolutionized our society, captivating users with its remarkable capabilities. Its rapid and widespread adoption is a testament to its transformative potential. At the core of this chatbot lies the GPT-4 language model (or GPT-3.5 for the free version), developed by OpenAI. We have since witnessed an explosive proliferation of comparable models, such as Google Bard, Llama, and Claude. But what exactly are these models and what possibilities do they offer? More importantly, are the publicized risks justifiable and what measures can be taken to ensure safe and accountable utilization of these models?

In this first part of our two-part article, we will discuss the following:

What is Large Language Models (LLM)?

Artificial intelligence (AI) is a technological field that aims to give human intelligence capabilities to machines. A generative AI is an artificial intelligence that can generate content, such as text or images. Within generative AIs, foundation models are recent developments often described as the fundamental building blocks behind such applications as DALL-E or Midjourney. In the case of text-generating AI, these are referred to as Large Language Models (LLMs), of which the Generative Pre-trained Transformer (GPT) is one example made popular by ChatGPT. More complete definitions of these concepts are given in Figure 1 below.

Figure 1: Definitions of key concepts around LLMs4

The technological history of the ChatGPT LLM

In 2017, a team of researchers created a new type of model within Natural Language Processing (NLP) called Transformer. It achieved spectacular performance for sequential-data tasks, such as text or temporal data. By using a specific technology called ‘attention mechanism’, published in 2015, the Transformer model pushed the limits of previous models, particularly the length of texts processed and/or generated. 

In 2018, OpenAI created a model inspired by Transformer architecture (the decoder stack in particular). The main reason for this was that Transformer, with its properties of masked attention, excels in text generation. The result was the first Generative Pre-trained Transformer. The same year saw the release of BERT, a Google NLP model, which was also inspired by Transformers. Together, BERT and GPT launched the era of LLMs.  

Improving the performance of its model over BERT LLM variants, OpenAI released GPT-2 in 2019 and GPT-3 in 2020. These two models benefited from an important breakthrough: meta-learning models. Meta-learning is a paradigm of Machine Learning (ML) in which the model “learns how to learn.” For example, the model can respond to tasks other than those for which it has been trained.  

OpenAI’s aim is for GPT Large Language Models to be able to perform any NLP task with only an instruction and possibly a few examples. There would be no need for a specific database to train them for each task. OpenAI has succeeded in making meta-learning a strength, thanks to increasingly large architectures and databases massively retrieved from the Internet.  

To take its technology further, OpenAI moved beyond NLP by adapting its models for images. In 2021 and 2022, OpenAI published DALL-E 1 and DALL-E 2, two text-to-image generators.10 These generators enabled OpenAI to make GPT-4 a multi-modal model, one that can understand several types of data.  

Next, OpenAI released InstructGPT (GPT 3.5), which was designed to better meet user demands and mitigate risk. This was the version OpenAI launched in late 2022. But in March 2023, OpenAI released an even more powerful and secure version: the premium GPT-4. Unlike preceding versions, GPT-3.5 and GPT-4 gained strong commercial interest. OpenAI has now adopted a closed source ethos – no longer revealing how the models work – and become a lucrative company (it was originally a non-profit association). Looking to the future, we can expect OpenAI to try to push the idea of a prompt for all tasks and all types of data even further. 

Why is everyone talking about Large language models?

Only those currently living under a rock will not have heard something about ChatGPT in recent months. The fact that it made half the business world ecstatic and the other half anxious should tell you how popular it has become. But let’s take a closer look at the reasons why. 

OpenAI’s two remarkable feats­­

With the development of meta-learning, OpenAI created an ultra-versatile model capable of providing accurate responses to all kinds of requests – even those it has never encountered before. In fact, GPT-4 achieves better results on specific tasks than specialized models. 

In addition to the technological leaps, OpenAI has developed democratization. By deploying its technology in the form of an accessible chatbot (ChatGPT) with a simple interface, OpenAI has made it possible for everyone to utilize this powerful language model’s capabilities. This public access also enables OpenAI to collect more data and feedback used by the model.

Rapid adoption  

The rapid adoption of GPT technology via the ChatGPT LLM has been unprecedented. Never has an internet platform or technology been adopted so rapidly (see Figure 2). ChatGPT now boasts 200 million users and two billion visits per month.  

Figure 2: Speed of reaching 100 million users in months.13

The number of Large Language Models is exploding, with competitors coming from Google (Bard), Meta (Llama), and HuggingFace (HuggingChat, a French open-source version). There is also a surge in new applications. For example, ChatGPT LLMs have been implemented in search engines and Auto-GPT, which latter turns GPT-4 into an autonomous agent. This remarkable progress is stimulating a new wave of research, with LLM publications growing exponentially (Figure 3).  

Figure 3: Cumulative number of scientific publications on LLMs.

Opportunities, fantasies, and fears

The new standard established by GPT-4 has broadened the range of possible use cases. As a result, many institutions are looking to exploit them. For example, some hospitals are using them to improve and automate the extraction of medical conditions from patient records.  

On the other hand, these same breakthroughs in performance have given rise to a host of fears: job insecurity, exam cheating, privacy threats, etc. Many recent articles explore this growing anxiety, which now seems justified – Elon Musk and Geoffrey Hinton are just two of the many influential tech figures now raising the alarm, calling it a new ‘code red.’  

However, as is often the case with technological advances, people have trouble distinguishing between real risk and irrational fear (e.g., a world in which humans hide from robots like those in The Terminator). This example explores the creation of a model that rivals or surpasses the human brain. Of course, this is inextricably linked with the formation of consciousness. Here, it is worth noting that this latter fantasy is the ultimate goal of OpenAI, namely AGI (Artificial General Intelligence). 

Whether or not these events will remain fantasies or become realities, GPT-4 and the other Large Language Models’ AI are undoubtedly revolutionizing our society and represent a considerable technological milestone.

What can you do with an LLM?

Essentially, a ChatGPT LLM can:

  1. Generate natural language content: Trained specifically for this purpose, this is where they excel. They strive to adhere to the given constraints as accurately as possible.
  2. Reformulate content: This involves providing the LLM with a base text and instruction to perform tasks, such as summarizing, translating, substituting terms, or correcting errors.
  3. Retrieve content: It is possible to request an LLM to search for and retrieve specific information based on a corpus of data.

How can you use an LLM?      

There are three possible applications of Large Language Models’ AI, summarized in Figure 4. The first one is direct application, where the LLM is only used for the tasks that it can perform. This is, a priori, the use case of a chatbot like ChatGPT, which directly implements GPT-4 technology. While this is one of the most common applications, it is also one of the riskiest. This is because the ChatGPT LLM often acts like a black box and is difficult to evaluate. 

One emerging use of LLMs is the auxiliary application. To limit risks, an LLM is implemented here as an auxiliary tool within a system. For example, in a search engine, an LLM can be used as an interface for presenting the results of a search. This use case was applied to the corpus of IPCC reports.19 The disadvantage here is that the LLM is far from being fully exploited.  

In the near future, the orchestral application of ChatGPT LLMs will consume much of the research budget for large organizations. In an orchestral application, the LLM is both the interface with the user and the brain of the system in which it is implemented. The LLM therefore understands the task, calls on auxiliary tools in its system (e.g., Wolfram Alpha for mathematical calculations), and then delivers the result. Here, the LLM acts less like a black box, but the risk assessment of such a system will also depend on the auxiliary tools. The best example to date is Auto-GPT.

Figure 4: The three possible applications of an LLM

Focusing on the use case of a Chatbot citing its sources

One specific use case that is emerging among our customers is that of a chatbot citing its sources. This is a response to the inability of Large Language Models’ AI to interpret results (i.e., the inability to understand which sources the LLM has used and why).

Figure 5: Technical diagram of a conversational agent quoting its sources

To delve into the technical details of the chatbot citing its sources (The relevant pattern – illustrated in Figure 5 – Is called Retrieval Augmented Generation or ‘RAG’), the model takes a user request as input, which the model then transforms into an embedding (i.e., a word or sentence vectorization that captures semantic and syntactic relationships). The model has a corpus of texts already transformed into embeddings. The goal is then to find the embeddings within the corpus that are closest to the query embedding. This usually involves techniques that find the nearest neighbours’ algorithms. Once we have identified the corpus elements that can help with the response, we can pass them to an LLM to synthesize the answer. Alongside the response, we can provide the elements that were used to generate it. The LLM then serves as an interface for presenting the search engine’s results. This ‘RAG’ approach therefore facilitates the decoupling of factual information provided by the sources from the semantic analysis provided by the LLM, leading to better auditability of the results provided by the Chatbot.  

Read more in Auditing ChatGPT – part II


Grégoire Martinon

Trusted Senior AI Expert at Quantmetry
Grégoire leads Quantmetry’s trustworthy AI expertise. He ensures that missions keep close to the state of the art and directs the scientific orientations on the theme of trustworthy AI. He is the R&D referent for all projects aimed at establishing an AI certification methodology.

Aymen Mejri

Data Scientist at Quantmetry
Holding two master of engineering degrees and a master of science in data science, I have decided to steer my career towards the field of artificial intelligence, specifically towards natural language processing. I am part of the NLP expertise at Quantmetry.

Hadrien Strichard

Data Scientist Intern at Capgemini Invent
Hadrien joined Capgemini Invent for his gap year internship in the “Data Science for Business” master’s program (X – HEC). His taste for literature and language led him to make LLMs the main focus of his internship. More specifically, he wants to help make these AIs more ethical and secure.

Alex Marandon

Vice President & Global Head of generative AI Accelerator, Capgemini Invent
Alex brings over 20 years of experience in the tech and data space, beginning his career as a CTO in startups and later leading data science and engineering in the travel sector. Eight years ago, he joined Capgemini Invent, where he has been at the forefront of driving digital innovation and transformation for his clients. He has a strong track record in designing large-scale data ecosystems, especially within the industrial sector. Currently, as the Global Lead of Capgemini Invent’s generative AI Acceleration Lab, Alex crafts Gen AI go-to-market strategies, develops assets, upskills teams, and assists clients in scaling AI and Gen AI solutions from proof of concept to value generation.

Hao Li

Data Scientist Manager at Capgemini Invent
Hao is Lead Data Scientist, referent on NLP topics and specifically on strategy, acculturation, methodology, business development, R&D and training on the theme of Generative AI. He leads innovation solutions by confronting Generative AI, traditional AI and Data.

    Stay informed

    Subscribe to get notified about the latest articles and reports from our experts at Capgemini Invent