SoftBank's Humanoid Robot, Pepper, Repor…

BackFacebook, Instagram, and Twitter Failed to Proactively Remove Targeted Racist Remarks via Automated Systems

Facebook, Instagram, and Twitter Failed to Proactively Remove Targeted Racist Remarks via Automated Systems

Jul 11, 20212 reportsSeverity: MinorToolHigh confidence

Social media platforms' automated content moderation systems failed to detect racist emoji-based attacks against Black England soccer players after a major game loss, requiring manual user reporting and community intervention to address the hate speech.

After England's loss in the European Championship final on July 11, 2021, three Black players (Marcus Rashford, Jadon Sancho, and Bukayo Saka) were targeted with racist attacks on Facebook, Instagram, and Twitter. The attacks included monkey and banana peel emojis used as racist markers. Facebook's AI systems initially failed to recognize these emoji combinations as hate speech, instead marking them as 'benign comments.' The platforms relied on user reporting rather than proactive detection. Twitter removed 1,000 posts in 24 hours, while Facebook declined to specify removal numbers. Instagram chief Adam Mosseri acknowledged the systems were 'mistakenly marking some of these as benign comments.' Facebook explained that between January-March 2021, it removed over 25 million pieces of hate speech content (97% before user reports) on Facebook and 6.3 million on Instagram (93% before reports), but this incident revealed gaps in emoji-based hate speech detection. The companies updated their systems after the incident to better detect similar attacks. Community members counteracted the hate by posting supportive messages and reporting offensive content.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: Facebook, Instagram, Twitter
: Facebook, Instagram, Twitter
: Marcus Rashford, Jadon Sancho, Bukayo Saka, Facebook Users, Instagram Users, Twitter Users

AI System Classification

: Content Moderation
: Hate Speech Detection
: Tool
: 3 Limited Risk
: 1

Population Impact

: 3
: 1,000

External Links

View on AI Incident Database