The Problem With Amazon’s AI Recruiter

Resume-Reader Was Biased Against Women. Here's Why.

Oct 31, 2018

SMITH BRAIN TRUST – last year abandoned an AI recruiting tool that it had built, after the company’s machine-learning experts discovered the tool was discriminating against female job applicants.

The e-commerce giant was wise to do so, says P.K. Kannan, the Dean’s Chair in Marketing Science at the University of Maryland’s Robert H. Smith School of Business. His research into artificial intelligence and machine learning has underscored the dangers of blindly relying on algorithms to make important decisions. “Small errors,” he explains, “like a flaw in the attribution scheme, can have a major impact. With machine learning, if an algorithm is used day in and day out, the mistake builds up.”

According to news reports, the company’s computer models had been programmed to judge applicants based on patterns in the resumes that had been submitted to the company over a 10-year period. However, the vast majority of applicants over the period had been men, a fact that mirrors the demographic makeup of the tech industry as a whole. Words associated with women, such as “women’s basketball,” would cause the model to discard the candidate.

Kannan, who was recently named one of six UMD Distinguished Scholar-Teacher award recipients for the 2018-19 academic year, focuses much of his research on digital marketing, specifically modeling applying statistical and econometric methods to online and offline marketing data. His current research stream focuses on freemium models, attribution modeling, media mix modeling, new product/service development and customer relationship management.

We asked him to share some of his insights. 

Q: Do you believe that Amazon can correct the failings of its earlier tool? And, given that errors build upon themselves, what must Amazon do to guard against future failures?

A: Amazon’s problem is with the data and not necessarily with the algorithms themselves. Machine learning and AI tools are very good at learning from past data and using them to develop rules for winnowing through the applications and selecting the right candidates. But when the past data is based on human selection criteria and smaller pool of women candidates, there could be problems. 

For example, when the number of women applicants in the past data is small, it is difficult for these algorithms to discriminate between acceptable and unacceptable women candidates. Given this and tolerance level for Type 1 and Type 2 errors, the algorithms may end up tagging more men candidates as acceptable. 

Also, human biases tainting the past data may play a role in compounding the error. If, for example, past recruiters deliberately downplayed the importance of some criteria to avoid selecting women candidates, these biases will persist in the AI-based recommendations. And, if the AI-based biased data is also included for the next cycle of selection, such errors can compound. 

These failings can be corrected with some effort. Similar to goal programming, where some criteria are optimized subject to attaining some specific goals, recruiting tools can aim for some goals on percentage of minority or underrepresented candidates to be retained in the selected pool. Also, some monitoring of algorithms to check if they work in the intended way would also help.

Q: Machine-learning resume-screening tools have been found to be problematic for other firms as well. Is resume-sorting a compatible task for artificial intelligence? If so, how long until AI resume-screening becomes the norm?

A: As I indicated above, the problem is with the training data. If the training data is clean and unbiased, then these algorithms would work most of the time. Collecting a large volume of training data may take some time, but if the process of selecting candidates which are part of the training data is faulty you will have the same problem all over again.

Researchers are working on techniques to de-bias and give more weight to certain criteria to make the training dataset more unbiased. The important thing to note here is that we assume that the algorithm creators are aware of the biases they have. Biases that originate subconsciously cannot be de-biased. Therein lies the problem with bias in AI-based recommendations. 

While the algorithms can be used, the solutions they provide need to be monitored regularly to ensure that there are no major biases.




About the Expert(s)


P. K. Kannan is the Dean's Chair in Marketing Science at the Robert H. Smith School of Business at the University of Maryland. His main research focus is on marketing modeling, applying statistical and econometric methods to marketing data. His current research stream focuses on attribution modeling, media mix modeling, new product/service development and customer relationship management (CRM).

More In


Which Were the Best Super Bowl Ads?

Companies that aired ads during Super Bowl LIV shelled out big money to grab an amplified audience’s attention through the clever, the emotional or the just plain weird. Our experts discuss which ads worked, and which didn't.

Feb 04, 2020
Why NBC’s Peacock Is Poised To Ruffle Some Feathers

NBCUniversal isn’t worried about ruffling a few feathers with its new streaming service. In fact, it’s hoping to.

Jan 22, 2020
The Expanding Dominion of Alipay

Cash may be king in much of the world, but in China, Alipay wears the crown.

Jan 20, 2020