Twitter's AI Moderation Tool Misidentified Rockets as Pornography

Jan 3, 20231 reportToolMedium confidence

Twitter's machine learning content moderation tools mistakenly identified rocket launch photos as intimate content, leading to multiple space-related accounts being suspended from the platform.

Twitter's automated content moderation system, which relies on machine learning tools, misclassified legitimate rocket launch photos as intimate or pornographic content. The confusion occurred because the AI tools mistook the visual characteristics of rockets for inappropriate imagery. Several prominent space-related accounts were affected, including Spaceflight Now, Michael Baylor from NASASpaceflight, Starbase Watcher, and photographer John Kraus who posted NASA's Artemis I launch video. The system automatically suspends accounts when it is 95% certain that content violates platform rules. According to a former Twitter employee, the tools had been known to misidentify appropriate pictures containing flesh-colored pixels as pornographic content. Twitter flagged the content as 'violating our rules against posting or sharing privately produced/distributed intimate media of someone without their express consent.' All affected accounts were eventually unlocked after the errors were discovered. Elon Musk acknowledged the issue, stating that 'our image recognition needs some work.' The incident occurred during a period when Twitter had laid off approximately half its workforce, including content moderation staff, following Musk's acquisition in late October 2022.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

7AI System Safety, Failures & Limitations

7.3Lack of capability or robustness

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:1: Negligible

National Security Assessment

Overall Score

Stakeholders

: Twitter
: Twitter
: Twitter Users

AI System Classification

: Content Moderation
: NSFW Content Detection
: Tool
: 3 Limited Risk
: 1

Population Impact

: 4
: 4

External Links

View on AI Incident Database