Five major commercial automated speech recognition (ASR) systems from Amazon, Apple, Google, IBM, and Microsoft exhibited substantial racial disparities, with average word error rates of 35% for Black speakers compared to 19% for white speakers.
Researchers from Stanford analyzed five commercial automated speech recognition systems developed by Amazon, Apple, Google, IBM, and Microsoft using audio from interviews with 42 white speakers and 73 Black speakers across five US cities. The systems were tested on 2,141 matched audio snippets from each group, totaling 19.8 hours of audio. All five ASR systems exhibited substantial racial disparities in performance, with an average word error rate of 35% for Black speakers compared to 19% for white speakers. Microsoft's system performed best overall while Apple's performed worst, but error rates for Black speakers were nearly twice as large across all systems. The disparities were particularly pronounced for Black men, who had a 41% error rate compared to 30% for Black women. The study found that 23% of audio snippets from Black speakers resulted in unusable transcripts (error rate above 50%) compared to only 1.6% for white speakers. The researchers traced these disparities to acoustic models rather than language models, suggesting the systems struggle with phonological and prosodic characteristics of African American Vernacular English rather than vocabulary or grammar differences.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
Accuracy and effectiveness of AI decisions and actions are dependent on group membership, where decisions in AI system design and biased training data lead to unequal outcomes, reduced benefits, increased effort, and alienation of users.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed