GPT-3 generated harmful biased content when prompted with single words related to marginalized groups including Jews, Black people, women, and the Holocaust.
A researcher tested GPT-3's content generation capabilities by prompting it with single words related to sensitive topics and marginalized groups, specifically 'Jews', 'black', 'women', and 'holocaust'. The AI system generated tweets containing harmful biases in response to these prompts. The researcher documented these outputs and shared them publicly to demonstrate the unsafe nature of the model due to its embedded biases. The incident was reported as evidence that natural language generation models require more progress on responsible AI before being deployed in production environments. The specific content of the generated tweets and the exact nature of the biases were not detailed in the report, but the researcher characterized them as demonstrating harmful biases that make the system unsafe for production use.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed