OpenAI's Whisper AI transcription tool was found to hallucinate fabricated text in transcriptions, including racist commentary and imagined medical treatments, while being widely deployed in medical settings despite company warnings against high-risk use.
OpenAI's Whisper AI transcription tool, marketed as having near 'human level robustness and accuracy,' was found to frequently generate hallucinated content including fabricated text, racial commentary, violent rhetoric, and imagined medical treatments. Research studies found hallucinations in 80% of transcriptions examined by a University of Michigan researcher, about 50% of over 100 hours analyzed by a machine learning engineer, and in nearly all of 26,000 transcripts created by another developer. A study of 13,000 clear audio snippets found 187 hallucinations. Despite OpenAI's warnings against use in 'high-risk domains,' medical centers have rushed to adopt Whisper-based tools, with over 30,000 clinicians and 40 health systems using Nabla's Whisper-based transcription tool that has processed an estimated 7 million medical visits. The tool is integrated into ChatGPT, Oracle and Microsoft cloud platforms, and was downloaded over 4.2 million times in one month from HuggingFace. Researchers found nearly 40% of hallucinations were harmful because speakers could be misinterpreted or misrepresented. Examples included adding fabricated violent content about killing people and knives, racist commentary about Black individuals, and non-existent medications like 'hyperactivated antibiotics.'
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that inadvertently generate or spread incorrect or deceptive information, which can lead to inaccurate beliefs in users and undermine their autonomy. Humans that make decisions based on false beliefs can experience physical, emotional or material harms
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed