During the COVID-19 pandemic, researchers developed hundreds of AI predictive tools for COVID diagnosis and patient risk assessment, but studies found none were fit for clinical use due to poor data quality and methodological errors, with some potentially harmful tools being deployed in hospitals.
During the COVID-19 pandemic starting in March 2020, AI researchers worldwide rushed to develop predictive tools to help hospitals diagnose patients and assess COVID risk. Multiple comprehensive studies published in 2021, including reviews by Laure Wynants at Maastricht University covering 232 algorithms and Derek Driggs at Cambridge University analyzing 415 deep-learning models for medical imaging, found that none of the hundreds of developed tools were fit for clinical use. The AI systems suffered from fundamental flaws including training on mislabeled data, 'Frankenstein datasets' with duplicates causing models to be tested on training data, and algorithms that learned to identify irrelevant features like patient position, hospital fonts, or children rather than COVID symptoms. Despite these problems, some tools were already being used in hospitals and marketed by private developers under nondisclosure agreements, preventing proper evaluation. Wynants expressed concern that 'they may have harmed patients' due to potential missed diagnoses or underestimated risk for vulnerable patients. The failure was attributed to poor collaboration between AI researchers lacking medical expertise and medical researchers lacking mathematical skills, rushed development timelines, and inadequate data sharing protocols during the health crisis.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
Human
Due to a decision or action made by humans
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed
No population impact data reported.