Skip to Content

Bias in NLP models – identifying and removing bias within word embeddings

Capgemini
29 Mar 2021

According to Wikipedia “Bias is a disproportionate weight in favour of or against an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair.”
Is it possible that NLP models are inherently biased? What could be the cause of bias in them? Can we identify it? Is there anything that we do about it?

Introduction to word embeddings

In the last decade, we have seen some incredible advances in the NLP field that have drastically improved language translation, search, chatbots, text analysis and more. Much of this can be credited to the invention of word embeddings – a technique that automatically maps words or phrases into real number vectors that contain useful properties such as:

Figure 1: Illustration of a small subset of the embedding space. Bubbles indicate word vectors, while dashed arrows indicate inter-word relationship vectors. Source: Capgemini.
Figure 1: Illustration of a small subset of the embedding space. Bubbles indicate word vectors, while dashed arrows indicate inter-word relationship vectors. Source: Capgemini.
  • Low dimensionality of written text compared to the traditional approaches such as bag-of-words. Word embeddings allow us to embed the semantic properties of any word in just 100-1000 dimensions – low enough for a wide range of ML models to handle.
  • Within the word embedding space, related words cluster together, while unrelated ones are far apart. This allows us to geometrically measure semantic similarities (or differences) between any two words or phrases by calculating the cosine distance between them.
  • Ability to compute vectors that represent different types of inter-word relationships and morphological properties. For example, by subtracting “man” from “king” we’ll get a vector that is very close to what we would get if we subtracted “woman” from “queen” – the resulting vector represents “royalty”! Such representation can be very useful for comparing the meaning of words directly or as an input to ML models for text related tasks downstream.

There is variety of methods for generating word embeddings but the underlying concept for all of them is based on an old idea that “a word is characterized by the company it keeps” (Firth et al. (1957)).

While this approach works very well for many applications, relying on context to represent words can introduce and even amplify implicit biases found in the training text. Such bias is then likely to propagate to ML models down the pipeline which can inadvertently cause algorithmic social discrimination on various social groups or individuals – an outcome we desperately want to avoid!

This usually happens because our social biases are statistically absorbed into the languages we use, and word embeddings are specifically made to extract the statistical properties of a language. Even relatively neutral sources such as Wikipedia have enough bias to introduce gender, racial and other stereotypes into word embeddings.

In the last few years, we have seen the word embedding technology progress even further with transformer-based language modelling techniques such as BERT, RoBERTa, XLNet, GPT-2 and GPT-3. These models allow us to accurately embed not only single words or phrases, but long strings of text such as whole sentences, paragraphs and even documents. Every such advancements brings us new applications in the field of NLP, thus addressing the bias within embeddings is becoming increasingly important.

How can we identify bias?

To quantify bias within word embeddings, NLP researchers have developed a number of bias identification methods. An early “bias direction” approach (Bolukbasi et al. (2016)) measures the cosine distance from a word of interest (e.g., nurse) to a stereotyped group (e.g., female) and to the non-stereotyped group (e.g., male). If the distances are substantially different, we can assume bias in the embeddings. However, this approach although proven useful, later turned out to be insufficient, as this metric does not detect more subtle, structural bias that relates to how words are grouped within the embedding space. (Gonen et al. (2019)).

Newer approaches in bias identification tend to focus on the deeper associations between words. One such approach identifies bias by evaluating coreference resolution systems that rely on word embeddings (Zhao et al. (2018)). Given a sentence “The physician hired the secretary because he was over-whelmed with clients” the system should associate “he” with “the physician” as otherwise the sentence would not make sense. Then, we would replace “he” with “she” and evaluate it again. If using different pronouns gives us different results, we can imply that the embeddings are biased.

Figure 2: Cosine distances from stereotyped occupations to “male” and to “female” vectors in the embedding space (calculated using word2vec on a variety of web-based sources). It can be seen that the distances from nurse/secretary to female is significantly less than from nurse/secretary to male, and vice-versa for manager/programmer. This indicates a gender stereotype bias within word embeddings. Source: Capgemini.
Figure 2: Cosine distances from stereotyped occupations to “male” and to “female” vectors in the embedding space (calculated using word2vec on a variety of web-based sources). It can be seen that the distances from nurse/secretary to female is significantly less than from nurse/secretary to male, and vice-versa for manager/programmer. This indicates a gender stereotype bias within word embeddings. Source: Capgemini.

Figure 3: Co-reference resolution tests in the WinoBias dataset. Male and female entities are marked in solid blue and dashed orange, respectively. For each example, the gender is irrelevant for the co-reference decision. Systems must be able to make correct linking predictions in pro-stereotypical scenarios (solid purple lines) and anti-stereotypical scenarios (dashed purple lines) equally well to pass the test. (source Zhao et al. (2018))

How can we address the bias that we just identified?

Having a reliable bias identification method is an essential first step for developing effective debiasing algorithms. One of the earliest debiasing algorithms – “hard-debiasing” (again Bolukbasi et al. (2016)) focuses on neutralisation of biased words by equalising their distances to the stereotyped and non-stereotyped word groups. However, the fact that it uses the flawed “bias direction” metric for identifying bias, it only superficially hides bias. After debiasing, the neutralised words are still contained within clusters that consist of words associated with the stereotyped group e.g., words such as receptionist, nurse, teacher, homemaker remain within one cluster while captain, doctor, professor, programmer within another.

The latest research on debiasing word embeddings focuses on a number of techniques such as data augmentation (Lu et al. (2019)), adjusted objective functions during training (Masahiro et al. (2019)), and post-training debiasing (Ethayarajh et al. (2019)). For instance, in data augmentation approach, to prepare the training data, we replace gender identifying words with words of the opposite gender. These replacements are then combined with the original data and fed into the model for training. By doing this, we balance out the bias seen in the text with the opposite bias, thus making the model neutral towards both groups.

Conclusion

If you’re planning to apply NLP to your business problems, bias is something that you should very strongly consider as not paying attention could cause serious ethical consequences. Think about the data sources that were used to train language models – is it possible that they carry a demographic bias? Most likely, the answer is yes, and therefore, you should always check for the potential implications of bias and ways to mitigate them. Make sure you’re using the accurate methods for identifying bias and choose the right approach for debiasing. Remember that machine learning models are made to generalise their decisions based on data, thus if there is any bias in the data, it is likely to get amplified. For an informed, experienced, and current approach to your AI solution and its fairness implications, please contact us.

Through our Capgemini UK’s Ethical AI Guild we provide guidance on ethical issues and practices. Made up of experienced AI practitioners, the guild looks to accelerate our clients’ journeys towards ethical AI applications that benefit all.