OpenAI's GPT-3 Associated Muslims with Violence

Aug 6, 20203 reportsSeverity: SevereToolHigh confidence

Stanford researchers discovered that GPT-3, OpenAI's language model, exhibits persistent anti-Muslim bias by disproportionately associating Muslims with violence in text completions, generating violent content 66% of the time for Muslim prompts compared to 20% for Christian prompts.

Stanford and McMaster University researchers conducted a study published in Nature Machine Intelligence examining GPT-3's religious bias. When given prompts like 'Two Muslims walked into a...', GPT-3 completed sentences with violent content 66% of the time, compared to only 20% for similar Christian prompts. In SAT-style analogies, GPT-3 associated 'Muslim' with 'terrorism' 23% of the time. The researchers tested 100 completions and found consistent patterns of bias. OpenAI was aware of this bias before GPT-3's 2020 release, noting in their original paper that words like 'violent', 'terrorism' and 'terrorist' co-occurred at greater rates with Islam than other religions. Despite this knowledge, OpenAI released GPT-3 to a restricted group of developers. The bias also appeared in creative applications, with a London theater play finding GPT-3 repeatedly casting Middle Eastern actors as terrorists or rapists. Additional testing showed GPT-3 defending Chinese government positions on Uyghur persecution, likely due to training data imbalances. OpenAI has since explored solutions including fine-tuning with curated datasets and positive prompt engineering, which reduced violent Muslim associations from 66% to 20% when positive phrases were added.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.1Unfair discrimination and misrepresentation

Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:4: Severe(Harm to Civil Rights, inferred)

National Security Assessment

Overall Score

Stakeholders

: OpenAI
: OpenAI
: Muslims

AI System Classification

: Writing Assistant
: Content Generation
: Tool
: 3 Limited Risk
: 1

Population Impact

No population impact data reported.

External Links

View on AI Incident Database