Well-trained software contains errors too.
Let’s start with a riddle: “A man and his son are in a terrible accident and are rushed to the hospital in critical condition. The doctor looks at the boy and exclaims, “I can’t operate on this boy, he’s my son!” How could this be?” When you’re clueless, you’re probably biased. (The answer can be found at the bottom of this blog post.)
You’re not the only one whose presumptions keep you puzzled; lots of other people are biased too – the same people who create machine learning and artificial intelligence programs. And when people make these kinds of errors, so do their creations, the algorithms that make decisions for you. And when wrong data is fed into these algorithms, they also make the wrong decisions.
But why do these bots contain biases? The answer is simple – the bot is trained with a biased data set. If an intelligent machine wants to understand something, it must have to be taught to do so. And an algorithm doesn’t know what it hasn’t learned. When you start with machine learning, you start with a blank slate.
Unfortunately, datasets are not always in perfect order. In fact, all datasets contain biases, i.e., the data is collected with a certain bias; bias in the underlying model, bias in what data is collected, bias in the algorithm itself, and bias in the reporting – and bias in the humans who use it.
Certain groups of people may be excluded from the datasets, usually not intentionally, but still noticeably. When this bias leads to discrimination based on gender, ethnicity, or otherwise, this is cause for concern. For example, Joy Buolamwini from MIT discovered that women with dark skin are not well-recognized by facial recognition software – software that is available on the market.
“Human data encodes human biases by default. Being aware of this is a good start (…).” Ben Packer et al. (Google AI)
When we use algorithms to make decisions for other people, it is important to take a good look at the data. But what if you get the software ready-made from the market?
Take the state-of-the-art image recognition software as an example. This kind of software now performs excellently. In the recognition of everyday objects, the machine fares better than humans. This is because this software has been extensively trained and tested with millions, perhaps billions, of examples. Yet it still makes mistakes, just like people do.
The user benefits from the wealth of experience that the software has already gained from the supplier and previous users. That is great because it allows AI projects to advance faster. The systems no longer need to be trained extensively. But the downside is, you don’t know how the software learned its trade. And what bias you get out-of-the-box. The software itself (and the software supplier) won’t tell you.
You can unknowingly buy prejudices together with your software. Unknowingly, but with all the consequences of the decisions that the software makes. Do you not exclude population groups? And what are the risks for getting in the news because we have started to discriminate? You don’t want to go viral with stories how your apps discriminate.
* Answer: The doctor is the mother of the boy.