Skip to content

AI can be unintentionally biased: Data cleaning and awareness can help prevent the problem

Artificial intelligence will bever be completely free of bias, but there are ways to make it as unbiased as possible.


” data-credit=”Image: Sompong Rattanakunchon / Getty Images”>Artificial Intelligence project creating. Abstract concept of cyber technology, machine learning.Brain of AI. Futuristic Innovative technology in science concept

Image: Sompong Rattanakunchon / Getty Images

More about artificial intelligence

Most artificial intelligence systems strive for 95% accuracy of results when benchmarked against the traditional methods of determining outcomes. But how can organizations safeguard against systems so the AI doesn’t inadvertently inject bias that affects the accuracy of results?

Bias can be injected into AI by faulty algorithms, by lack of complete data on which the algorithms operate or even by machine learning that operates on certain biased assumptions. 

SEE: TechRepublic Premium editorial calendar: IT policies, checklists, toolkits, and research for download (TechRepublic Premium)

One example is an Amazon recruiting tool that began with an AI project in 2014. The intent of the AI application was to save recruiters time going through resumes. Unfortunately, it wasn’t until one year later that Amazon realized that the new AI recruiting system contained inherent bias against female applicants. This flaw occurred because Amazon had used historical data from its past ten years of hiring. Over the prior ten years, bias against women was created because there had been male dominance in the industry, and men had comprised 60% of Amazon employees.

“Programmers and developers can incorporate technology to detect or unlearn bias in AI before it’s deployed,” said Rachel Brennan, senior director of product marketing at Bizagi, which develops intelligent process automation solutions.

Brennan said there is a narrative, mainly played into by pop culture, that bias in AI is a nefarious act done by some secret club. “The thing is, biased AI is typically never a nefarious act,” she said. “It comes straight from the data the AI is trained on. If there is a bias in the data, then it’s being implicitly learned and incorporated.”

SEE: Natural language processing: A cheat sheet (TechRepublic)

One way to proactively limit bias is to check the data going into AI and machine learning twice over during data preparation. 

“What we need to remember is that bias is often unintentional, mainly because programmers and developers aren’t explicitly looking for bias,” Brennan said. “A data person is looking at data just as data and might not be able to see that information from a different perspective, like a business perspective, for example. There are so many nuances and factors that can play into data results, and if you’re only looking at the outcome from a data perspective, the biased data can slip by.”

Brennan’s point is well taken. IT and data scientists aren’t the experts when it comes to evaluating data for bias. In most cases, the end business knows the subject (and the data) best. There are also IT algorithms that can be used and that scan for common biases, like race, gender, religion, socioeconomic status, etc.

SEE: Top 5 biases to avoid in data science (TechRepublic)

“These algorithms can search for and flag potential bias to programmers and developers,” Brennan said. “This, of course, slows down the process, which is why many data scientists might skip the step, but it’s a point of ethics and is crucial if the end AI result is going to be helpful rather than harmful. For example, if the AI is going to determine eligibility for a mortgage loan, it absolutely cannot be biased, and it’s on data scientists to ensure they’ve double checked the information being learned by AI. If it’s AI for a quiz to determine what breed of dog you would prefer, it’s not as imperative.”

Cleaning data upfront is important to the quality of AI decisions. This includes the initial clean of AI data, and cleaning vigilance over data ingested by ML, and the followup algorithms that operate on it.Throughout all processes, end business user-experts should be involved. 

“In the real world, we don’t expect AI to ever be completely unbiased any time soon,” Brennan said. “But AI can be as good as the data and the people who create the data.”

For companies striving for bias-free AI and ML results, this means doing everything humanly possible to vet data and algorithms and accepting longer project timelines to get the data—and the results—right.

Also see