Researchers found that Microsoft Copilot provided scientifically accurate medical information only 54% of the time, with 42% of answers potentially causing serious harm and 22% potentially causing death.
Researchers from Germany and Belgium conducted a study testing Microsoft Copilot's responses to common medical questions. They asked the AI system about the 10 most popular medical questions in America and information about 50 of the most prescribed drugs and medicines, generating 500 total answers. The researchers scored these responses for accuracy and completeness against established medical knowledge. The results showed that Copilot only provided scientifically accurate information 54% of the time, with 24% of answers not matching established medical knowledge and 3% being completely wrong. In terms of potential harm, 42% of the AI answers were considered likely to lead to moderate or mild harm to patients, while 22% could potentially cause death or severe harm. Only 36% of the responses were considered harmless. The research was published as a paper and highlights concerns about people relying on AI systems for medical advice, particularly those who cannot easily access medical professionals. The incident adds to existing problems with AI search systems, following similar issues with Google's AI providing dangerous recommendations.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that inadvertently generate or spread incorrect or deceptive information, which can lead to inaccurate beliefs in users and undermine their autonomy. Humans that make decisions based on false beliefs can experience physical, emotional or material harms
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed
No population impact data reported.