Last week, Bianca wrote on Channels about bias in AI, stating that the problem starts with inherently biased data. She noted that “if your data is biased, either due to the data gathering process or biased sample of your population, your model is going to be biased as well.” This post was timely, as the topic of AI bias is one that is getting talked about more and more, especially in relation to race issues.
Only yesterday, MIT Technology Review posted an article on how AI programmes are learning to exclude some African American voices. Natural Language Technology, which looks through endless written documents to mine public opinion online, relies on understanding of existing languages to interpret meaning. Trouble comes into this when we consider the use of slang or particular vernaculars, as the article exemplified;
Brendan O’Connor, an assistant professor at the University of Massachusetts, Amherst, and one of his graduate students, Su Lin Blodgett, looked at the use of language on Twitter. Using demographic filtering, the researchers collected 59.2 million tweets with a high probability of containing African-American slang or vernacular. They then tested several natural-language processing tools on this data set to see how they would treat these statements. They found that one popular tool classified these posts as Danish with a high level of confidence.
“If you analyze Twitter for people’s opinions on a politician and you’re not even considering what African-Americans are saying or young adults are saying, that seems problematic,” O’Connor says.
While the technology itself cannot be said to have a racial bias as it is only operating as it has been told to with the languages it has been programmed to understand, the fact that we have overlooked certain vernaculars shows a human bias in prioritising whose opinions we are trying to learn more about. This may not seem like a major problem, but is far reaching as it extends to any system which uses language, such as search engines.
Search engines themselves have their own problems, as shown in the World White Web project, started by graphic designer Johanna Burai. Burai searched the term ‘hands’ on Google Images and was shown only images of white hands in the search results. Google bases its results on how often images are used and how they are described, so while once again this is not a conscious decision to exclude people of colour, the data provided to the search engine has its own bias. The World White Web attempts to correct this by providing images of different hands for people to use to boost non-white hands up search rankings. The website states:
When you search for images of the word "hand", all you see on Google are white hands, regardless of where you are in the world. World White Web is an initiative that wants to put an end to the norm of whiteness on the Internet. If we all share the images on this site, we can change the search results on Google to include hands of people of color too.
So, what it would mean for a system to be unbiased? As humans, it’s probably quite hard for us to even imagine this; even when we try our very best or are sure we are not being biased in anyway, our thoughts and decisions are based on years of ingrained patterns of thought and behaviour. How to begin to remedy this? Unexpectedly, to become more like computers ourselves. Laurie Penny describes this best:
Sometimes we fail to be as fair and just as we would like to be – not because we set out to be bigots and bullies, but because we are working from assumptions we have internalised about race, gender and social difference. We learn patterns of behaviour based on bad, outdated information. That doesn’t make us bad people, but nor does it excuse us from responsibility for our behaviour. Algorithms are expected to update their responses based on new and better information, and the moral failing occurs when people refuse to do the same. If a robot can do it, so can we.