Pharmaceutical company applies AI to monitor online reports of drug side effects


Client: Global pharmaceutical company

Region: Europe

Client challenge: A global pharmaceutical company recognized that traditional reporting methods such as physician notes, peer-reviewed publications, and clinical trials were no longer sufficient to discover all potential AEs of a commercially released drug

Solution: Capgemini created an application that identifies AEs by using machine learning and deep learning techniques on social media platforms, health forums, and similar outlets in order to complement current reporting methods.


  • An efficient, automated and extendable solution that captures patients’ observations on drug side effects on social media platforms such as Twitter
  • More reliable identification of unexpected AEs to enable more effective understanding and minimization of harm to patients
  • Improved AE detection and automatic alert created for pharmacovigilance teams to further process downstream actions

Building an application to better review social media

By applying a Natural Language Processing algorithm and active social media dataset mining, the partners intended to efficiently identify AEs by automatically extracting drug and AE mentions from tweets. The project team faced various linguistic challenges when programming the application to identify the relevant social media content. These included the use of figurative text such as metaphors, irony, and sarcasm, layman’s language, fragmentation (e.g. wrong punctuation, acronyms, emoticons, etc.), the use of idiomatic expressions and slang, or typos.

To provide the best-suited solution for each task, the project team tested state-of-the-art methods of deep learning. In a first step, basic machine learning algorithms such as Naïve Bayes, Random forest and SVM were applied, before deep learning algorithms such as BiLSTM (Bi-directional Long Short-Term Memory) together with other techniques such as SMOTE and focal loss came into play.  As a result, seven models were successfully implemented and optimized for the two tasks of identifying drug keywords and the effect of the mentioned drug.

To manage the automated extraction of drug or AE mentions from tweets, the partners applied a bidirectional long short-term memory (BiLSTM) model for tagging, which was further improved with the addition of conditional random field (CRF) layers. Custom and glove embedding were used to improve the results, which were optimized using cutting-edge methods such as bidirectional encoder representations from transformers (BERT) and BERT-CRF models.

Working in close partnership with the client organization, Capgemini could successfully deliver the final application that identifies AEs among tweets for commercially available drugs. In addition, the application is also able to automatically extract drug and AE mentions from tweets. Both features were developed using machine learning and deep learning techniques applied by Capgemini experts in data science.

Accurate insights facilitate AE identification

The development of the system has resulted in multiple benefits for the company. It is unambiguously capable of discerning whether text contains AEs or not. Furthermore, the system can detect drug keywords and identify the side effect of that drug. If an AE is detected, an alert is automatically forwarded to pharmacovigilance teams to further process downstream actions on the event. The immediate response to these alerts actively enables the organization to ensure highest safety standards and can help to avoid the negative impact that it could possibly have on other patients.

Moreover, vital information regarding the type of drug, type of event and patient metrics can be extracted from the automated tools to gain actionable insights from patient reported outcomes. Literature searches can also be proficiently managed by an AI-driven system and thus facilitate the process for pharmacovigilance professionals. It does so by reviewing articles for AEs and automatically generating query records.

For more information please contact: