Skip to Content

How to develop your pharma laboratory into an AI-driven research factory


In our earlier blog article about AI service units, we discussed establishing an AI service unit to scale AI successfully. This concept can be adapted to various industries. Today, we describe how pharmaceutical laboratories can be turned into AI-driven research factories by applying our digital lab framework.

For several years, R&D organizations in pharma and life sciences have been investing heavily in the connection of lab instruments and software. Although this development isn’t complete, it’s time to think about how these efforts will further increase insight creation. Once all lab software and instrumentation have been connected, lab data will be more readily available and human error in everyday lab work can be reduced.

But how does the business benefit from connected lab data?

Digitalization of the lab brings many advantages: it enables global collaboration within the company, increases the efficiency of work performed in the lab, and yields more and better quality data. Insights are created quickly and innovative technologies are adopted sooner. Furthermore, structures for the routine design, testing, deployment, and scaling of data analytics use cases help produce tangible results that fulfill specific business goals, for instance the identification of a new molecular target for the development of a coronavirus vaccine.

In the digital lab framework, capabilities are aligned coherently within six dimensions: harmonized agile workflows, data quality management, data standardization, data governance, data analytics, and the digital lab platform. The illustration below and the following paragraphs provide details on how we see pharma laboratories transform into AI-driven research factories.

Harmonized agile workflows

Implementing AI is a strategic decision. Defining a clear vision and mission for the AI-driven research factory that aligns with the strategy of the pharma company is a key factor for success. Laboratories are the competence centers for generating business value by creating innovation. The vision defines general goals for a pharma company’s labs and R&D units. These goals are then translated into respective capability portfolios. Capability portfolios in pharma labs can be defined by certain analytical methods (such as mass spectrometry, chromatography, next-generation sequencing, vector design, etc.) or research topics (such as cardiovascular diseases, oncology, etc.), encompassing all necessary methods to facilitate discovery. Furthermore, it is important to decide which target operating model will be applied to the network of labs, the R&D organization: are all labs that are dedicated to the discovery of new targets for the treatment of psychiatric and neurologic disorders located in one place or spread out globally? Will there be any organizational connection with cardiovascular disease research labs in order to find possible synergies or further investigate known scientific connections? Or alternatively, would a laboratory center perform experiments while a think tank delivers the experimental design and rationale?

Last but not least, while agile ways of working have been used in the area of software development, we see great benefits of introducing agility to the lab as well: Setting up agile teams with the lab/project lead acting as a product owner and the lab technician as the scrum master allows for a fail-fast culture with frequent creation of knowledge increments. We recommend including a data scientist into each lab scrum team as the person who takes care of data right from the time when that data is created.

Data quality management

Consequently, the data scientist executes data-centric processes in the lab — in addition to the “usual” biochemical method-based lab workflow that biochemists handle. The main data-centric processes in the lab are data quality management, data standardization, and data governance.

Data quality management ensures that the raw data produced in the lab is monitored closely for anomalies. A special focus should be set on the data quality dimensions of completeness, consistency, validity, and accuracy. Should an anomaly be detected in the raw data sets, biochemists can go back, search for the root cause of the anomaly, and repeat the experiment right away.

Data standardization

Data standardization has always been a major challenge in any life sciences lab. This is because many different methods are performed with instruments of many different vendors. Usually, most instruments create data in their own format, which makes it almost impossible to compare data. Standardized data formats (Allotrope Data Format ADF, Analytical Information Markup Language AnIML) and individual data standardization concepts aim at closing this gap in lab data science.

Data governance

Data governance establishes a structure with regards to lab data. It describes the meaning of data, where it is stored, how it is generated and used, and who manages it. Data governance ensures compliance with data quality and data standardization rules as a baseline for data analytics. These processes result in the thorough preparation of data for performing analytics and AI use cases.

Digital lab platform and data analytics

Key technologies in the digital lab are, on the one hand, digitalized biochemical analysis methods and, on the other hand, data analytics technologies. However, in the past, the latter has not received as much attention as needed for the full digitalization of the lab space. The technology stack recommended in the digital lab framework covers tools of both areas. These tools will be connected to the digital lab platform in a self-service, plug-and-play manner.

In the lab environment, the digital lab platform also serves as an AI platform that is central for each AI-driven research factory. The AI platform supports every phase of the AI lifecycle — from data sourcing, to discovery, to deployment. AI use cases are prioritized according to the R&D strategy and the first AI prototypes are developed and tested. If tests are passed successfully, the prototype is scaled to several labs and lab units. This brings a wealth of new insights and ideas about which new business models can be developed.

Central to the delivery of these data-centric services is, again, a platform that allows self-service of the required tools. This digital lab platform serves as a means of integration between the biochemical and the data technologies needed in the pharma R&D lab of the future: (big) data streaming tools, data governance tools, biochemical software tools, data visualization and analysis tools, and the like are integrated into the platform and readily available for use.

Thus, the digital lab framework enables R&D to create and deliver high-potential AI use cases and translate them from prototypes into real business value.

Capgemini Invent’s digital lab experts help you analyze your current lab landscape and rate the maturity of data quality management, data governance, data standardization, and data analytics as well as introduce agile ways of working in the lab.

We would then identify areas of improvement and derive measures to turn your R&D organization into a data-driven research factory.

This ultimately represents the substantial advancement of a digital transformation that has begun in labs some time ago with key initiatives such as instrument connectivity and the paperless lab.

To learn more, please visit us.


Dr. Katja Tiefenbacher is an AI strategy expert at Capgemini Invent focusing on target operating models and data management for scaling AI.