Skip to Content

Can data drive more successful R&D for life sciences?

October 9, 2020

The demand for new, personalized, trusted medicines and therapies delivered quickly and cost-effectively is increasing globally – especially during a global pandemic.

As a civilization, we are arguably in a historically unprecedented position, with greater understanding of illness and disease and more patient data available than ever before. But the route to market for medicines has not changed significantly in a number of years, and the status quo presents numerous obstacles to opportunity.

Data-driven R&D: the challenge and the opportunity

Digitalization in healthcare is impacting the entire value chain and is generating large amounts of heterogenous data – but often it is not sufficiently useful or available to be harnessed.

For example, the fight against cancer has delivered explosive growth in the volume and richness of both proprietary and publicly available data that links the human genome with the molecular basis of cancer and the efficacy of drug-like molecule cancer treatments. However, these datasets are highly varied, often unstructured, and generated by different tools and techniques, resulting in data quality and consistency issues. Clinical oncologists need to be able to search, filter, and interactively explore a unified view across the full scope of this data in a single, intuitive platform.

Similarly, life sciences and R&D professionals spend a large amount of time examining clinical trial notes, patient data sets, and documents aiming to identify key issues (e.g., adverse drug effects) and getting an overall view on the document content. These documents and medical records are highly complex and unstructured, often leading to an information overload and inaccurate or missing information. Oncologists need help in driving insights and understanding patterns from unstructured documents.

Increasing R&D costs and lead times

Drug discovery is still an expensive and time-consuming process for mass-market drugs. But demand for customized and personalized healthcare solutions (including for rare diseases) is increasing, with the rising time and cost burden of regulatory approval making the situation more acute.

Drug discovery and clinical trials generally have two key phases, each with two major steps:

  1. Early stage research and development
  • Target discovery and validation of the mechanisms driving disease that can be targeted with specific molecules, compounds, or formulations
  • Lead discovery and optimization: identification and modification of drug candidates that impacts the behavior of the chosen disease target.
  1. Preclinical and clinical trials
  • Design and planning of (pre-)clinical trials covering topics such as site and patient selection, logistics and supplies, and clinical treatment regime and monitoring
  • Conduct clinical trials and manage risks that may be presented, collect and process data, and ensure regulatory approvals and compliances.

Typically phase one can take three to five years, and phase two can take six to seven years. It is difficult to shorten this cycle because there are few financial or human levers available – the amount of research work, skills and regulation involved means that companies cannot simply throw more money or people at the process.

The availability of big data tools and data sciences with their potential to unlock the R&D process makes these skills even more critical. Medicines research scientists may lack the time and specialist skills to master modern data techniques that search and structure vast amounts of highly dimensional data, to identify causal relationships that can generate new hypotheses at scale.

Newly qualified data scientists who haven’t worked in R&D-heavy organizations, including life sciences and healthcare, can struggle to understand the pharmaceutical science and drug approval process, how best to represent and interpret it accurately with their tools, or where to focus their efforts.

Plus some rigid data discovery platforms can struggle to be sufficiently responsive to the rapid cadence R&D drug discovery processes demand. For example, they may not be capable of adapting at sufficient speed to the fast-changing demands for pipelining of new data sources, ingestion of complex unstructured data sets, supporting new data science tooling, or deploying new R&D models and capability at scale.

In order to meaningfully accelerate R&D, all of these dimensions need to be aligned, successfully applied with accurate execution, and all with the correct systems and infrastructure in place – to facilitate rapid development of new models and immediate pipelining of data that is needed for success. And there is no easily scalable human solution to this.

AI and data science promise a steady stream of new solutions

Artificial intelligence (AI), data science, and analytics are making a significant impact on R&D and drug discovery. These tools and techniques help research scientists unravel the relationships, behaviors and genetic factors driving disease, enhance the search and analysis efficiency of the complex chemical space, and increase the efficiency of clinical trials by optimizing trial management, recruitment, patient reporting, and real-world data collection.

AI-enabled solutions can reduce the time and cost of getting new therapies to the market, avoid costly late stage failures, and accelerate regulatory approval processes and reporting. They can also deliver greater personalized therapeutics that have a significant impact upon the patients outcome and quality of life.

Experience and insight for the future

Life sciences and pharmaceutical suppliers are taking advantage of AI and data science in their R&D organizations. But they are finding further challenges in evolving their established R&D processes and teams to exploit them at scale. AI for R&D only delivers value when innovative solutions are implemented at scale and are trusted – and most AI project failures happen because one of these elements is missing.

Changing the culture and daily work of R&D can be a major conceptual change for life sciences companies, and one that cannot happen overnight. And even developing the right drug for a disease is no guarantee of success – the timing of the launch, branding and pricing all play a part.

At Capgemini, helping life sciences companies to adopt AI and data science and embed them into their R&D processes is an area of expertise. We have enabled some of the biggest global biopharma companies to explore the potential of AI and data-driven R&D in many novel and diverse areas including rare disease diagnostics, candidate molecule searches, adverse effect pattern identification and patient selection for clinical trials.

We can offer a full set of skills to match all stakeholders and we are able to leverage 25,000+ experts and key partnerships, including with cloud hyperscalers which enable scale to be planned in at the outset, to deliver proof of value rather than a proof of concept. Learn all about our Data-driven R&D transformation for life sciences!


Anne Laure Thieullent Vice President, Artificial Intelligence and Analytics Group Offer Leader, Capgemini I advise Capgemini clients on how they should put Artificial Intelligence solutions to work for their organization. Choosing the right technology for the right usage is key, but how your company should change the way it acts around data is vital. My passion is to bring technology, business transformation and governance together and take our clients to where they want to be as Intelligent Entreprises, while cultivating the values of trust, privacy and fairness.
Nick Clarke – Head Of Analytics at Altran I have developed an unusual breadth of experience through delivering solutions to data-driven problems across many different industry sectors, including oil & gas, transportation and drug discovery. In combination with 10 years spent developing new models of chemical bond formation, I have a strong feel for what can and cannot be done with data. I have learned to focus upon people and their motivations first, worrying about the details of the data once I have understood those. This means I often have a different take on problems to other data professionals.