Auto-Insurance Photo-Based Estimation Al…

BackPlayers Manipulated GPT-3-Powered Game to Generate Sexually Explicit Material Involving Children

Players Manipulated GPT-3-Powered Game to Generate Sexually Explicit Material Involving Children

Apr 1, 20211 reportSeverity: MinorCollaboratorHigh confidence

OpenAI discovered that its text-generation technology powering AI Dungeon was being used to generate sexual content involving children, prompting the company to require Latitude to implement content moderation that triggered user backlash.

In December 2019, Utah startup Latitude launched AI Dungeon, a text-based adventure game using OpenAI's text-generation technology that allowed players to create personalized stories. In July 2020, the game upgraded to OpenAI's more powerful commercial GPT-3 technology. In April 2021, OpenAI discovered through a new monitoring system that some players were prompting the AI to generate stories depicting sexual encounters involving children. OpenAI immediately demanded Latitude take action, stating this violated their acceptable use policies. Latitude implemented a new content moderation system that automatically flagged certain words and allowed manual review of flagged content. The moderation system proved oversensitive, blocking innocent phrases like '8-year-old laptop' and triggering widespread user complaints about privacy violations and censorship of adult content. Users revolted on social media platforms, claiming the company was scanning private fictional content and betraying user trust. The incident highlighted the challenges of moderating AI-generated content while balancing user privacy expectations. OpenAI now requires Latitude to use OpenAI's own filtering technology. The game attracts more than 20,000 daily players, and an analysis of sample content found 31 percent of adventures contained sexually explicit material.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Financial Loss, direct)

National Security Assessment

Overall Score

Stakeholders

: OpenAI, Latitude
: Latitude
: Latitude

AI System Classification

: Game Content Generation
: Writing Assistant
: Collaborator
: 2 High Risk
: 1

Population Impact

: 20,000
: 20,000

External Links

View on AI Incident Database