While others have proposed that attention or scale is “all you need” to unleash AI’s full potential, the greatest obstacle we face may not be to create a bigger model, but to create a more grounded one.

The discipline of Alignment Engineering focuses on aligning AI model outputs with data, context, world models and values, and in doing so, finally opening the door to AI systems that we can truly trust.

The term alignment in AI has long been used in a narrow sense to mean creating AI that responds, on average, in the same way as humans do. However, this massively understates the broader goal of alignment. It shouldn’t just be about aligning with one narrow aspect of human discourse, but about making AI align with everything that matters to us, whether that relates to concrete facts and data or more subjective ethical objectives. An unaligned AI is a perfect map of a territory that does not exist and to make it trustworthy we must force it to inhabit our world. Future AI systems that are making important decisions will need semantic clarity that goes far beyond what a language model alone can ever provide.

There has been much focus on internal model performance as a route to success in AI, but external model alignment is just as, if not more, important. We cannot just hope that a sophisticated enough model will discover what is important to us—we must consciously and deliberately build this alignment into our systems. We use the term Alignment Engineering to refer to this crucial discipline for the future of AI.

So, what are we actually aligning? In biological intelligence, we see a continuous process of reconciling our sensory experiences with our beliefs, internal world models and knowledge, constantly trying to match what we see and hear with what we know or believe. An AI system faces a similar challenge—to align its direct data-driven experiences with pre-existing knowledge, principles, context and world models. In short, there should be a constant negotiation between what an AI’s model “thinks”, with the grounded worldview and principles it has been given. We refer to the latter collection of beliefs here as a worldview, but it is also referred to by many other terms such as belief systems, schemas, cognitive frameworks, ontologies, or constitutions.

Aligning to different levels of the worldview

Within a worldview, there can be many different types of knowledge, which have different degrees of objectivity/subjectivity:

  • Data – We need to ensure that AI systems are grounded in accurate, up-to-date, and relevant data. Techniques like Retrieval Augmented Generation (RAG) can be seen as examples of Alignment Engineering for data – efforts to align our model with a specific dataset.
  • Contextual and world models – The term world model here is used in the broadest sense. It does not necessarily refer to the physical world but instead more generally refers to the totality of knowledge relevant to the environment the system is operating within. For example, a robot operating in a busy city will need a comprehensive physical and societal model to operate effectively, but a digital AI agent operating inside an e-commerce website only needs to know about a limited world of product, stocks and orders. There is a big difference between the world and a world.
    To be clear, these world models represent the a priori knowledge about that environment which is different to the operational data that might be used in training or perceived during use. Aligning to the rules that govern a world and understanding causation within it allows an AI system to hypothesize and predict scenarios within it even if they have no direct experience of them. Alignment Engineering in this category is about discovering, disambiguating, and encoding those rules, and baking them into overall AI systems so that they can be used effectively, and with confidence, to make decisions in combination with more data-driven approaches.
  • Societal and ethical principles – While the previous points are about adhering to concrete data, facts and rules, the alignment to laws, ethics, culture and morals provide an even greater challenge, precisely because they are less tangible and more subjective. There also may be multiple different and sometimes contradictory objectives to align against. For example, my personal ethical outlook might be different from other users, or from the company I work for, which may in turn be different from the cultural norms and ethics in the many different countries that my company operates in. The job of an Alignment Engineer here is to make the intangible tangible, to take an abstract and subjective concept and turn it into concrete constraints that can be implemented within the AI system.

The Alignment Engineering workflow

Regardless of what we’re aligning to, there are a number of common steps in the Alignment Engineering workflow:

  • Worldview representation – To align against something, we need that thing to be digitally represented. For simple data this is trivial but it becomes much more complex when dealing with more abstract concepts such as world models or ethics. The resulting Worldview can then be used to align the model’s output against.
  • Worldview alignment – Once we have created the Worldview to align against, we must actually change the behavior of our AI system to reconcile the outputs of the AI model with the constraints and principles embodied in our Worldview. This is a complex discipline with no one-size-fits-all solution as it involves making value judgements about the precedence of different types of knowledge. For example, is my raw sensory data more important than what my world model says? Should my company ethics override national or international law?
  • Worldview maintenance – Our beliefs must be constantly updated as our understanding of the world improves, or if facts become obsolete of superseded.

The challenge and opportunity of Alignment Engineering

This lens of Alignment Engineering unifies many things that have previously been separated. Within this view of the field, tasks like “ensure the AI’s output matches our company policy”, “ensure the predictions align with the laws of physics” and “ensure the AI acts ethically” are all just instances of the same general task of reconciling a model’s output with a formally stated worldview or belief system. This presents a powerful opportunity to take previously abstract concepts such as ethics and make them real.

However, much work is still needed to build out the Alignment Engineering toolbox. Some of this already exists in classical AI. For example, Bayesian Belief Networks, ontologies, knowledge-graphs and other similar approaches are an ideal representation of world models that can characterize the interconnected messiness of the real world. Alignment Engineers will need to master the complexity, contradiction, and uncertainty of the real-world, to create systems that can make good decisions in the face of those challenges. The potential reward is huge though. AI systems that are aligned to our belief systems are AI systems that we can trust and rely on, where accuracy is based on real understanding, and explanation is based on a genuinely shared worldview rather on impenetrable mechanistic descriptions.

True intelligence in a system is not measured by the scale, speed or eloquence of its processing, but by the fidelity of its connection to reality. Alignment is the gravity that keeps AI from drifting into delusion and hallucination, and by anchoring it to our worldview, we transform it from a clumsy probabilistic engine into a dependable partner that deeply respects the architecture, principles and values of that world.