Alaska Education Department Reportedly P…

BackAI Transcription Tool Whisper Reportedly Inserting Fabricated Content in Medical Transcripts

AI Transcription Tool Whisper Reportedly Inserting Fabricated Content in Medical Transcripts

Oct 26, 20241 reportSeverity: SevereToolHigh confidence

OpenAI's Whisper AI transcription tool was found to hallucinate fabricated text in transcriptions, including racist commentary and imagined medical treatments, while being widely deployed in medical settings despite company warnings against high-risk use.

OpenAI's Whisper AI transcription tool, marketed as having near 'human level robustness and accuracy,' was found to frequently generate hallucinated content including fabricated text, racial commentary, violent rhetoric, and imagined medical treatments. Research studies found hallucinations in 80% of transcriptions examined by a University of Michigan researcher, about 50% of over 100 hours analyzed by a machine learning engineer, and in nearly all of 26,000 transcripts created by another developer. A study of 13,000 clear audio snippets found 187 hallucinations. Despite OpenAI's warnings against use in 'high-risk domains,' medical centers have rushed to adopt Whisper-based tools, with over 30,000 clinicians and 40 health systems using Nabla's Whisper-based transcription tool that has processed an estimated 7 million medical visits. The tool is integrated into ChatGPT, Oracle and Microsoft cloud platforms, and was downloaded over 4.2 million times in one month from HuggingFace. Researchers found nearly 40% of hallucinations were harmful because speakers could be misinterpreted or misrepresented. Examples included adding fabricated violent content about killing people and knives, racist commentary about Black individuals, and non-existent medications like 'hyperactivated antibiotics.'

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

3Misinformation

3.1False or misleading information

AI systems that inadvertently generate or spread incorrect or deceptive information, which can lead to inaccurate beliefs in users and undermine their autonomy. Humans that make decisions based on false beliefs can experience physical, emotional or material harms

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:4: Severe(Loss of Privacy, inferred)

National Security Assessment

Overall Score

Stakeholders

: OpenAI
: OpenAI
: Patients, Patients Reliant On Whisper, Medical Practitioners Reliant On Whisper

AI System Classification

: Voice Recognition
: Translation
: Tool
: 2 High Risk
: 1

Population Impact

: 4,200,000

External Links

View on AI Incident Database