Genderify’s AI to Predict a Person’s Gen…

BackThoughts App Allegedly Created Toxic Tweets

Thoughts App Allegedly Created Toxic Tweets

Jul 18, 20201 reportSeverity: MinorToolMedium confidence

GPT-3 generated harmful biased content when prompted with single words related to marginalized groups including Jews, Black people, women, and the Holocaust.

A researcher tested GPT-3's content generation capabilities by prompting it with single words related to sensitive topics and marginalized groups, specifically 'Jews', 'black', 'women', and 'holocaust'. The AI system generated tweets containing harmful biases in response to these prompts. The researcher documented these outputs and shared them publicly to demonstrate the unsafe nature of the model due to its embedded biases. The incident was reported as evidence that natural language generation models require more progress on responsible AI before being deployed in production environments. The specific content of the generated tweets and the exact nature of the biases were not detailed in the report, but the researcher characterized them as demonstrating harmful biases that make the system unsafe for production use.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.1Unfair discrimination and misrepresentation

Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Toxic or Malicious Content, inferred)

National Security Assessment

Overall Score

Stakeholders

: OpenAI
: Satria Technologies
: Thoughts Users, Twitter Users

AI System Classification

: Social Media Content Generation
: Tool
: 3 Limited Risk
: 1

Population Impact

: 1

External Links

View on AI Incident Database