OpenAI's Whisper speech-to-text AI system was found to hallucinate violent and harmful phrases when processing speech with longer pauses, particularly affecting people with speech impairments like aphasia.
Cornell researchers analyzed OpenAI's Whisper, an AI-powered speech recognition system released in 2022 and trained on 680,000 hours of audio data. The researchers tested over 13,000 speech clips from AphasiaBank, a repository containing audio from people with aphasia and those without speech impairments. They found that approximately 1% of Whisper's transcriptions contained entirely hallucinated phrases, including violent language with words like 'terror,' 'knife,' and 'killed' that were not present in the original audio. The system was more likely to hallucinate when analyzing speech from people with longer pauses between words, such as those with speech impairments. The hallucinations also included fake websites, random names, address fragments, and YouTuber-style phrases. The researchers hypothesize that the underlying large language model technology treats silence as a type of word, leading to these fabricated outputs. Lead researcher Allison Koenecke warned that such hallucinations could cause significant harm if used in AI-based hiring, courtroom trials, or medical settings.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
Accuracy and effectiveness of AI decisions and actions are dependent on group membership, where decisions in AI system design and biased training data lead to unequal outcomes, reduced benefits, increased effort, and alienation of users.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed
No population impact data reported.