Researchers found that Google's Perspective API, an AI system designed to detect toxic comments, exhibited significant biases by incorrectly flagging identity-related terms and failing to accurately assess incivility in news content.
Researchers from the University of Pennsylvania and others conducted studies on Google's Perspective API, a machine learning system developed by Jigsaw to detect toxic comments online. The API was trained on over a million online comments and deployed by major news organizations including The New York Times, The Guardian, and The Economist. Multiple independent studies revealed that the system exhibited systematic biases, rating phrases like 'I am a gay black woman' as 87% toxic while 'I am a man' received much lower toxicity scores. The system also flagged identity descriptors such as 'Black,' 'Muslim,' 'feminist,' and 'gay' as toxic, even when used in neutral contexts. In testing on news transcripts from PBS NewsHour, MSNBC's Rachel Maddow Show, and Fox News' Hannity, the API failed to accurately distinguish between different levels of incivility perceived by human annotators, with agreement rates of only 51%. The system was unable to detect subtle forms of incivility like sarcasm while over-predicting toxicity for words commonly used in news reporting. Researchers also demonstrated that the system could be easily bypassed by simple misspellings, such as changing 'idiot' to 'idiiot,' which reduced toxicity scores from 84% to 20%.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed