Hundreds of AI models developed for COVID-19 diagnosis and prognosis from medical imaging were found to be fundamentally flawed due to methodological errors, poor data quality, and lack of external validation, creating a credibility crisis in medical AI research.
During the COVID-19 pandemic, researchers rapidly developed over 400 AI models claiming to diagnose or predict COVID-19 outcomes from chest X-rays and CT scans. A systematic review by University of Cambridge researchers examined all these models published between January 1, 2020 and October 3, 2020. The review found that every single model was fatally flawed and unsuitable for clinical use. Common problems included training on small, non-diverse datasets, reusing the same data for training and testing, lack of external validation on different patient populations, and poor documentation of methods. Only 62 of the 415 initially screened studies passed basic quality screening, and 55 of those were still found to be at high risk of bias. Some models were even trained on adult COVID-19 data but tested on pediatric pneumonia images, creating artificially impressive but meaningless results. The researchers noted that similar methodological problems exist throughout medical AI research, not just COVID-19 studies. A separate investigation found that only 73 of 161 FDA-approved AI medical products publicly disclosed their validation data amounts, and just seven reported racial demographics of study populations. The widespread flawed research has created what researchers call a 'polluted area of research' that undermines trust in medical AI and could potentially worsen patient care if unreliable algorithms are deployed clinically.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
Human
Due to a decision or action made by humans
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed
No population impact data reported.