iGPT, SimCLR Learned Biased Associations from Internet Training Data

Jun 17, 20201 reportSeverity: SubstantialToolHigh confidence

Researchers at Carnegie Mellon and George Washington University discovered that unsupervised AI image models like OpenAI's iGPT and Google's SimCLR encode gender and racial biases from their training data, even without human-labeled images.

Ryan Steed from Carnegie Mellon University and Aylin Caliskan from George Washington University studied two unsupervised learning algorithms: OpenAI's iGPT (a version of GPT-2 trained on pixels) and Google's SimCLR. These algorithms learn from unlabeled images without human annotations. The researchers adapted techniques previously used to examine bias in natural language processing models, using mathematical representations called embeddings that cluster similar content together. They found that both AI systems exhibited stereotypical associations similar to those measured in human Implicit Association Tests, with photos of men clustering closer to images of ties and suits while photos of women appeared farther apart. This research revealed that even without human-created labels, the images themselves from internet datasets encode harmful stereotypes due to overrepresentation of certain demographics and stereotypical portrayals online. The findings have concerning implications for downstream applications, particularly when these models are fine-tuned for sensitive uses like hiring, policing, or other consequential decision-making systems. The researchers worry about potential harm when such biased models are deployed in real-world applications and call for greater transparency, more testing before deployment, and more responsible dataset curation practices.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.1Unfair discrimination and misrepresentation

Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Pre-deployment

Occurring before the AI is deployed

Harm Severity Assessment

Highest Score:3: Substantial(Differential Treatment, inferred)

National Security Assessment

Overall Score

Stakeholders

: OpenAI, Google
: OpenAI, Google
: Gender Minority Groups, Racial Minority Groups, Underrepresented Groups In Training Data

AI System Classification

: Image Classification
: Tool
: 2 High Risk
: 1

Population Impact

No population impact data reported.

External Links

View on AI Incident Database