Why we should not be surprised by the success of ChatGPT and generative AI

Robert Engels

Feb 1, 2024

In the whirlwind of technological progress, the seemingly sudden rise of phenomena like ChatGPT and Generative AI can create an impression, to people outside the field at least, of surprise and novelty. However, beneath this surface lies a rich tapestry of historical development, with roots tracing back through decades of AI research. This article aims to dispel the notion of this being a surprise advance by contextualizing the success of ChatGPT within the broader historical trajectory of AI advancements. By delving into key moments and milestones, we can see that the path that led to the current wave of Generative AI, is not just well-understood, but that this destination was an expected and natural consequence of everything that came before it.

Once we realize that today’s advances in Generative AI are not isolated instances, but just the latest in a long line of similar advances, it begs the question – what can we learn from those previous instances that might be applicable to the Generative AI era? What are the traps and pitfalls that we need to avoid repeating? Because despite it´s undeniable initial success leading to a hype, LLMs are suffering of some long-standing issues of applied AI relating to misinterpretation of real-world contexts, trust, explainability and alignment. So, what are the essential factors that truly drive success in new AI waves? Armed with this information, we can move forward into the new era of Generative AI, amalgamating the best of the past with opportunities of the present.

Stepwise Changes in AI: A Repeated Pattern

The evolution of technology often follows a pattern of stepwise changes, where swift introductions of new capabilities are followed by longer periods where those capabilities are embedded into the fabric of society, before then laying the foundations for the next innovation. These changes redefine the boundaries of human capacity, reshaping industries and influencing the course of societies. The advent of ChatGPT and the broader interest in Generative AI exemplify one such transformative step, building incrementally on the progress of preceding decades to create something that seemingly comes out of nowhere.

History however, shows us that this advance was neither as new or surprising as it might seem. Delving into this historical perspective unveils the intricate fabric of AI development leading inexorably to this point:

1940s-1950s: Formative Concepts

The mid-20th century marked the dawn of AI with foundational concepts that continue to shape its trajectory to this day. Alan Turing’s proposal of a “universal machine” set the stage for the theoretical understanding of computation, laying the groundwork for future AI endeavors. Vannevar Bush’s vision of the “memex,” an imaginary device that could store and link information, foreshadowed the notion of machine memory and knowledge representation. In 1951, the first neural network was created, over 70 years before we generative AI would make extensive use of them.

1960s: The Birth of Symbolic AI

The 1960s witnessed the emergence of Symbolic AI, where researchers sought to emulate human reasoning through symbolic representation and logic. Early AI systems like the General Problem Solver (GPS) showcased the potential of this approach by solving problems in a specific domain. However, these systems struggled with handling ambiguity and lacked the ability to learn from data.

1970s: The Cognitive Revolution

The 1970s ushered in the Cognitive Revolution in AI, shifting the focus from symbolic manipulation to human-like thinking processes. Researchers explored areas such as knowledge representation, natural language understanding, and expert systems. Planning systems like STRIPS manipulated symbolic knowledge in order to plan real-world tasks. Notably, ELIZA, a computer program capable of simulating conversation, sparked discussions about the boundaries between human and machine communication, very similar to those being triggered by ChatGPT.

1980s: The Knowledge Engineering Era

The 1980s marked the rise of the Knowledge Engineering era, where efforts were directed towards encoding human expertise into computer systems. So called Expert Systems became popular and tried to encode human decision-making processes into AI systems. These hard-coded systems often found it difficult to cope with the infinite complexity of the real world, but also showed how quite sophisticated decision making could emerge from the application of relatively small numbers of rules.

1990s: The Exploration of Reinforcement Learning

In the 1990s, Machine Learning become more prominent as a solution to automate the very labour-intensive knowledge acquisition tasks. Supervised and unsupervised machine learning algorithms of various kinds appeared for a variety of tasks. Multi-layer Neural Networks became a lively area of development and formed the basis for much of the successes two decades later. During those years, the exploration of Reinforcement Learning (RL) gained prominence. Much of the research focused on playing games, as these provided a perfect environment to explore how rewarding certain behaviors could lead to the emergence of strategies and tactics that were more effective than anything hard-coded could be. Notable highlights included agents that learned to play backgammon (G. Tesauro, 1992) and chess (J. Baxter, S. Thrun. 1995). These advances laid the groundwork for ChatGPT’s reinforcement learning foundation. However, many of the game-playing agents of this era still relied on brute-force search – looking ahead to enumerate possible moves, but this approach was fundamentally limited to the so-called “curse of dimensionality”. The real-world, and complex games like Chess and Go, had so many possible states that it was impossible to enumerate enough of them to make good long-term decisions. What was needed instead was the ability to learn general patterns of what “good” looked like rather than just trying to always make the best short-term move.

2000s: The Era of Deep Learning

The 2000s witnessed the emergence of Deep Learning, characterized by the training of complex neural networks using extensive datasets. Convolutional Neural Networks (CNNs) transformed image analysis, while Recurrent Neural Networks (RNNs) tackled sequential data. This era automated feature extraction, enabling AI to recognize patterns in data. Before this time, the raw data would be pre-processed to extract pre-defined features that a human thought might be useful before those features were passed to the machine learning algorithm to learn from. Deep learning merged these two phases together, simultaneously figuring out what the most salient features were and how to learn from them. The ability to do this at large scale (the scale of a whole written language for example) would prove crucial in the generative era to follow.

The ability to do this at meaningful scale was partly due to algorithmic advances, but the majority of the progress came from hardware. Driven by demands from the gaming and entertainment industries, graphics processing units (GPUs) started to become more powerful and more parallelized than CPUs and were soon being used for general purpose computing including AI. For the first time, the highly parallelized structure of a neural network had highly parallelized hardware to run on and the leap in performance was astounding.

2010s: Advances in Generative AI

The 2010s brought forth significant advancements in Generative AI. The introduction of Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) revolutionized the creative potential of AI. By learning the patterns present in data at massive scale these systems were able to create compact and efficient representations of complex knowledge, and by using multiple systems competing against each other, they were able to make those representations robust to the noise and complexity of the real-world.

However, while GANs and VAEs excelled in generating realistic data samples and compressing information, they struggled with understanding and processing sequences of data, such as text or time series data. This limitation became a crucial impetus for the development of attention mechanisms, initially applied in the context of neural machine translation. Attention mechanisms allowed models to weigh the importance of different parts of a sequence when making predictions, which greatly improved their ability to handle sequential data. This pivotal shift laid the foundation for self-attention mechanisms and ultimately culminated in the groundbreaking Transformer models, which revolutionized a wide range of AI applications, particularly in natural language understanding and generation, as they harnessed the power of attention to process sequences with unparalleled efficiency and effectiveness.

The emergence of attention mechanisms, such as used by Google’s BERT, OpenAI´s GPT and Meta´s LlaMa models, played a pivotal role in improving the ability to process natural language. By paying attention to the context in which a phrase was used, rather than just its grammatical and syntactic usage, these models were able to much more effectively disambiguate the vast number of possible interpretations of a passage of text much more quickly and very important: it could keep track of discourses throughout longer dialogues and even across languages. This kind of functionality was unequaled at the time.

But still something was lacking in order to ensure general uptake and understanding of the possibilities of these powerful new algorithms and models. It was still largely a technical exercise to use the models, and they needed tweaking before being usefull.

2020s: Convergence and Integration

In the current decade, AI’s various strands of development are converging and integrating. Advanced attention mechanisms, coupled with domain-specific knowledge, and reinforcement-learning from human feedback, have given rise to models like ChatGPT. These models are capable of interpreting complex natural language inputs, responding contextually, and demonstrating capabilities that were once only possible when conversing with other humans.

As Andrew Ng, a leading AI researcher, aptly put it, “AI is the new electricity. Just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don’t think AI will transform in the next several years.” This sentiment hints at the beginnings of the stepwise changes AI is undergoing, with each wave of development bringing us closer to AI’s transformative potential.

The interdependent significance of both hardware and software players is vividly illustrated by the dual trend of hyperscalers venturing into hardware production while hardware producers delve into software, even launching their own large language models. This reciprocal movement highlights the cohesive synergy between these facets of technology. Recognizing this holistic interplay between Generative AI’s hardware and software is pivotal for unleashing its potential.

Building Blocks for a Successful Future with Generative AI

To propel Generative AI into a mature phase, a series of interrelated building blocks must be realized. While technological advancements in algorithms and contextual understanding are critical, the success of Generative AI also hinges on its social integration and trustworthiness.

However, it’s unlikely this technology will in itself lead to the AI future we desire. Current Generative AI may appear to excel in specific tasks, but it lacks broader awareness. Leading AI research Rodney Brookes, highlights the gap between performance and general competence. Narrow ability in one task should not be confused with general competence that can be confidently deployed in other tasks. More targeted, smaller models for specific domains might be necessary. These models, while not as generic as large-scale language models, can deliver quick and efficient solutions for certain contexts.

We [humans] are able to generalize from observing performance at one task to a guess at competence over a much bigger set of tasks. We understand intuitively how to generalize from the performance level of the person to their competence in related areas…But the skills we have for doing that for a person break down completely when we see a strong performance from an AI program. The extent of the program’s competence may be extraordinarily narrow, in a way that would never happen with a person.” – Rodney Brookes

On the contextual front, Generative AI needs to not only interpret but also reason and deduce from context, understanding the intricacies of various situations. The challenge lies in equipping AI with background knowledge that allows it to function effectively across a wide array of contexts.

Moreover, trustworthiness is paramount for the widespread adoption of AI. As elaborated upon in our AI Lab blog series “AI: Going from good to useful”, the issue of trust hinges not just on the model’s technical performance but also on its ethical considerations, including privacy preservation, bias mitigation, and explainability.

Conclusion: Looking Beyond the Hype

The swift ascent of ChatGPT and Generative AI reflects a culmination of the incremental progress witnessed throughout AI’s history. What might seem like a sudden surge is, in fact, the result of decades of innovation, trial, and refinement. Today, as AI becomes an integral part of our societal fabric, this surge continues, evolving to encompass not just technical marvels but also profound ethical and social considerations.

While we stand on the cusp of AI’s transformative potential, it’s imperative to shift our perspective from surprise to understanding, acknowledging the lineage of ideas and innovations that have paved the way for this moment. As we embark on this journey, a reorientation is necessary, one that moves us away from the simplistic notion that technology alone can solve all problems. Instead, the next frontier for AI calls for a harmonious coexistence of digital and human actors, drawing insights from social sciences, cognitive psychology, ethics, and geopolitics to shape a future where technology amplifies humanity’s potential rather than supplants it. In the Capgemini AI Future Lab, we recognize this entire spectrum as being crucial to the commercial success of AI. The technological leap of Generative AI will not realize its full potential unless the secondary factors that support it are also developed. Embracing this holistic perspective, we’re poised to unlock the true magic of AI—a future where responsible innovation and ethics walk hand in hand.

Meet the author

Robert is an innovation lead and a thought leader in several sectors and regions, with a basis in his role of Chief Technology Officer for Northern and Central Europe in our Insights & Data Global Business Line. Based in Norway, he is a known lecturer, public speaker, and panel moderator. Robert holds a PhD in artificial intelligence from the Technical University in Karlsruhe, Germany.