Research revealed that chest X-ray datasets used to train AI diagnostic models from companies like Google exhibit systematic bias against women, minorities, and patients with Medicaid insurance, leading to disparate diagnostic accuracy across different demographic groups.
Researchers from the University of Toronto, Vector Institute, and MIT analyzed bias in chest X-ray datasets used to train AI diagnostic models developed by companies including Google, Qure.ai, Aidoc, and DarwinAI. These AI systems classify chest X-rays to identify conditions like fractures and collapsed lungs, with several hospitals including Mount Sinai piloting such algorithms for coronavirus patients. The researchers examined four major datasets: MIMIC-CXR (over 370,000 images), Stanford's CheXpert (over 223,000 images), NIH's Chest-Xray (over 112,000 images), and an aggregate dataset from over 129,000 patients. After training classifiers to near-state-of-the-art performance, they identified meaningful patterns of bias across all datasets. Female patients suffered the highest disparity despite representing nearly half the dataset. White patients, comprising 67.6% of images, were the most-favored group while Hispanic patients were least-favored. Patients with Medicaid insurance, representing only 8.98% of images, faced bias and often received incorrect diagnoses. The researchers noted limitations including potential labeling errors from natural language processing and possible confounding factors from imaging device quality and demographics. They recommended rigorous fairness analyses before deployment and clear disclaimers about dataset bias for clinical use.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
Accuracy and effectiveness of AI decisions and actions are dependent on group membership, where decisions in AI system design and biased training data lead to unequal outcomes, reduced benefits, increased effort, and alienation of users.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Pre-deployment
Occurring before the AI is deployed