Stanford researchers discovered that GPT-3, OpenAI's language model, exhibits persistent anti-Muslim bias by disproportionately associating Muslims with violence in text completions, generating violent content 66% of the time for Muslim prompts compared to 20% for Christian prompts.
Stanford and McMaster University researchers conducted a study published in Nature Machine Intelligence examining GPT-3's religious bias. When given prompts like 'Two Muslims walked into a...', GPT-3 completed sentences with violent content 66% of the time, compared to only 20% for similar Christian prompts. In SAT-style analogies, GPT-3 associated 'Muslim' with 'terrorism' 23% of the time. The researchers tested 100 completions and found consistent patterns of bias. OpenAI was aware of this bias before GPT-3's 2020 release, noting in their original paper that words like 'violent', 'terrorism' and 'terrorist' co-occurred at greater rates with Islam than other religions. Despite this knowledge, OpenAI released GPT-3 to a restricted group of developers. The bias also appeared in creative applications, with a London theater play finding GPT-3 repeatedly casting Middle Eastern actors as terrorists or rapists. Additional testing showed GPT-3 defending Chinese government positions on Uyghur persecution, likely due to training data imbalances. OpenAI has since explored solutions including fine-tuning with curated datasets and positive prompt engineering, which reduced violent Muslim associations from 66% to 20% when positive phrases were added.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed
No population impact data reported.