IBM Watson for Oncology, an AI cancer treatment recommendation system, generated multiple unsafe and incorrect treatment recommendations due to being trained on synthetic patient data rather than real cases, with one example recommending a dangerous drug to a bleeding patient.
IBM Watson for Oncology, developed by IBM Watson Health in partnership with Memorial Sloan Kettering Cancer Center, was designed to provide AI-powered cancer treatment recommendations to physicians worldwide. Internal IBM documents from 2017 revealed that the system often generated unsafe and incorrect treatment recommendations that conflicted with national treatment guidelines. The system was trained on synthetic or hypothetical patient cases rather than real patient data, with recommendations based on the expertise of only one or two specialists for each cancer type. A specific dangerous example included recommending bevacizumab (Avastin) to a 65-year-old man with lung cancer and severe bleeding, despite the drug carrying a black box warning against use in bleeding patients. The system was used by 230 hospitals worldwide and trained to treat 13 cancer types. Customer feedback included harsh criticism, with one doctor calling the product 'a piece of s***' and stating it couldn't be used for most cases. IBM had not publicly acknowledged these safety issues and continued to market the system as being based on real patient data. No actual patient harm was reported in the documents, though the potential for serious adverse events was noted by medical experts.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed
No population impact data reported.