Accelerating drug discovery with Artificial Intelligence (AI)

Publish date:

AI is emerging as an essential tool for transforming the drug development process. With AI we have the potential to discover actionable insights more rapidly than ever.

In January 2020, British company Exscientia and Japanese-based Sumitomo Dainippon Pharma announced that drug DSP-1181 would be entering phase I clinical trials. The uniqueness of this news comes from the fact DSP-1181, for obsessive compulsive disorder, was invented entirely by AI. This marks the first time an AI-generated drug will be tested on humans and is an exciting milestone for AI and its role in the pharmaceutical industry.

The drug development process

Drug development is lengthy and complex. From inception to achieving a license for marketing takes, on average, 12 years. A pathway like this is incredibly expensive with one study estimating costs ranging from $314 million to $2.8 billion USD. Figure 1 outlines different stages of development, and some of the activities involved. The reverse arrows illustrate how learnings from previous trials/developments inform the approach of studies in early stage discovery.

Figure 1 Adapted from “Drug discovery and development: Role of basic biological research” by Mohs RC and Greig NH, 2017, Alzheimer's & Dementia: Translational Research & Clinical Interventions, 3: 651-657.
Figure 1 Adapted from “Drug discovery and development: Role of basic biological research” by Mohs RC and Greig NH, 2017, Alzheimer’s & Dementia: Translational Research & Clinical Interventions, 3: 651-657.

One of the most time and cost intensive parts of this process is the initial drug discovery. At this stage, a thorough understanding of disease and pharmacology is necessary to identify target molecules from quite literally millions of possibilities.  It is here where the power of AI can be harnessed.

How can AI help?

Translating unstructured data

To gain an understanding of a disease, researchers must analyse an ever-growing library of publications and media. Digesting such large volumes of information is near impossible with manual effort. Natural Language Processing (NLP) is used to extract information from unstructured data and transform it to a format that can be more easily explored and analysed. NLP can help minimise labour-intensive manual curation, thereby streamlining literature review.

Molecular design

Novel drugs are usually developed using a design-make-test feedback loop. This involves designing new targeted compounds, synthesising them in a laboratory, and then testing for biological effect and efficacy. Recently, several research groups have been focused on the use of neural generative methods for molecular design. Generative models are trained on large datasets of bioactive molecules with drug-like properties. These models can generate compounds with targeted properties making them suitable drug candidates.

A study by biotechnology company, Insilico Medicine, used a generative reinforcement learning model to produce 30,000 designs for molecules that target one of the proteins involved in the creation of scar tissue during repair following injury. Lead candidates were identified and manufactured, with the two most promising molecules taken further by undergoing tests within cells. Eventually, one lead candidate was identified and subsequently tested on mice, where the AI-generated compound demonstrated strong potency against the targeted protein. The journey from design to experimental validation took less than two months in this study, illustrating just how rapidly AI can bring about leads which human researchers can then pursue.

Drug repurposing

Drug repurposing, also known as drug repositioning, involves finding new medical indications for pre-existing drugs.  In recent years it has grown in popularity as a method for developing new treatment given the reduction in cost and time compared to novel drug design.  NLP is one method briefly already mentioned that can be utilised in drug repurposing.  Other approaches include various network-based methods such as clustering and propagation. A recent case of this is with a deep-learning drug interaction model being used to predict affinity scores between commercially available antiviral drugs and target proteins. The model is based on a mechanism most commonly used in NLP tasks and is motivated by the idea that molecular sequences are similar to natural language sentences. It identified remdesivir, an antiviral initially developed for hepatitis C, as a potential treatment for COVID-19. A later study found that remdesivir may be effective in shortening time to recovery after COVID-19 infection.


AI within the drug development process is certainly showing promising results.  As with many innovations, however, there are challenges that the industry need to tackle.

Data quality

In Jon Howells’ post, we learned about the importance of data quality in Data Science. Without strong data quality foundations, initiatives are not likely to produce expected results. Any AI approach in the medical industry will use data from various sources, which are often collected and recorded using different standards. Moreover, there can simply be a lack of relevant data, particularly for new molecules. These considerations, if not addressed, will likely skew results.


A large majority of AI techniques are black boxes; the systems’ inputs and outputs are known but its internal workings are not readily understood. The medical and pharmaceutical industries are highly regulated, so this lack of explainability is, unsurprisingly, an issue. Some even argue that it presents a fundamental conflict for scientists.

What does it mean to be applying black-box methods in fields that are all about understanding what’s inside of a black box? . . . We don’t want to just have something that works in science, we want to understand the underlying processes there.

– S. Joshua Swamidass, Washington University

The future of AI in drug discovery

The impact of AI on drug discovery is undeniable.  As its prevalence increases in the industry, new questions will arise. There are questions about how we can create the awareness and education needed for proficient use and development of AI methods in the medical field. Relevant communities will also need to explore and hold debates around intellectual property: who is the inventor of a drug created by AI? Is it the programmer, the user, or the algorithm?  Is a drug truly invented if the inventor cannot explain how it came to be?  Those in the pharmaceutical industry used to more traditional methods may also need reassuring that these new methods complement their skills and knowledge rather than replacing their need altogether. Nonetheless, what is certain is that if current advancements forecast future progress then there is plenty to look forward to. AI is set to be a critical addition to the drug discovery toolkit.



Angelica Beleno

Angelica is a Data Scientist in the Insights & Data practice in the UK. She studied Mathematics (MMath) at the University of St Andrews before joining Capgemini as a graduate. Angelica has two years of experience across the public and private sector, delivering AI-powered solutions to solve key business problems.

Related Posts

Consulting Services

Engaging users in the design of systems means they are easier to use and achieve better outcomes

Date icon September 20, 2021

Designing with the user in mind results in better outcomes – improved quality of care,...

Insight Driven Enterprise

How will UK air travel bounce back from a Global Pandemic?

Date icon August 5, 2021

The winners of the Visual Analytics Competition explain their process and findings when...

Artificial Intelligence

Developing Explainable AI using inherently explainable machine learning models

Date icon August 5, 2021

Explainable AI is machine learning with a focus on how we can understand a models output,...