#LekkiMassacre: Why Facebook labelled co…

BackResearchers find evidence of racial, gender, and socioeconomic bias in chest X-ray classifiers

Researchers find evidence of racial, gender, and socioeconomic bias in chest X-ray classifiers

Oct 21, 20201 reportSeverity: SevereToolHigh confidence

Research revealed that chest X-ray datasets used to train AI diagnostic models from companies like Google exhibit systematic bias against women, minorities, and patients with Medicaid insurance, leading to disparate diagnostic accuracy across different demographic groups.

Researchers from the University of Toronto, Vector Institute, and MIT analyzed bias in chest X-ray datasets used to train AI diagnostic models developed by companies including Google, Qure.ai, Aidoc, and DarwinAI. These AI systems classify chest X-rays to identify conditions like fractures and collapsed lungs, with several hospitals including Mount Sinai piloting such algorithms for coronavirus patients. The researchers examined four major datasets: MIMIC-CXR (over 370,000 images), Stanford's CheXpert (over 223,000 images), NIH's Chest-Xray (over 112,000 images), and an aggregate dataset from over 129,000 patients. After training classifiers to near-state-of-the-art performance, they identified meaningful patterns of bias across all datasets. Female patients suffered the highest disparity despite representing nearly half the dataset. White patients, comprising 67.6% of images, were the most-favored group while Hispanic patients were least-favored. Patients with Medicaid insurance, representing only 8.98% of images, faced bias and often received incorrect diagnoses. The researchers noted limitations including potential labeling errors from natural language processing and possible confounding factors from imaging device quality and demographics. They recommended rigorous fairness analyses before deployment and clear disclaimers about dataset bias for clinical use.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.3Unequal performance across groups

Accuracy and effectiveness of AI decisions and actions are dependent on group membership, where decisions in AI system design and biased training data lead to unequal outcomes, reduced benefits, increased effort, and alienation of users.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Pre-deployment

Occurring before the AI is deployed

Harm Severity Assessment

Highest Score:4: Severe(Harm to Civil Rights, inferred)

National Security Assessment

Overall Score

Stakeholders

: Google, Qure.ai, Aidoc, Darwinai
: Mount Sinai Hospitals
: Patients Of Minority Groups, Low Income Patients, Female Patients, Hispanic Patients, Patients With Medicaid Insurance

AI System Classification

: Medical Diagnosis Support
: Image Classification
: Tool
: 2 High Risk
: 1

Population Impact

: 129,000

External Links

View on AI Incident Database