Facebook AI-Supported Moderation for Ads Failed to Detect Violating Content

Dec 8, 20211 reportSeverity: SubstantialToolHigh confidence

Facebook's AI content moderation systems failed to detect violent hate speech in advertisements targeting Kenya, with ads calling for killings and containing dehumanizing language being approved for publication despite company claims of improved detection capabilities.

Global Witness and Foxglove conducted a test of Facebook's content moderation systems by submitting advertisements containing violent hate speech targeting Kenya. The ads, submitted in both English and Swahili, contained calls for beheadings, rape, bloodshed, and compared people to animals. The Swahili language ads were easily approved by Facebook's AI detection systems. English ads were initially rejected only for profanity and grammar errors, but once these were corrected while keeping the hate speech content, the ads were approved. This represents the third such test that Facebook has failed, following similar failures in Myanmar and Ethiopia where ads used slurs and called for killings of ethnic groups. Meta claims to have dedicated teams of Swahili speakers and proactive detection technology, and reported taking action on over 37,000 pieces of hate speech content and 42,000 pieces of violence and incitement content in Kenya in the six months leading to April 2022. However, Global Witness resubmitted ads after Meta's July blog post about election preparations and they were again approved. The testing occurred as Kenya prepared for national elections in August 2022, a period when hate speech poses particular risks for real-world violence.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

7AI System Safety, Failures & Limitations

7.3Lack of capability or robustness

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:3: Substantial(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: Facebook
: Facebook
: Facebook Users Speaking Swahili, Facebook Users Speaking English, Facebook Users

AI System Classification

: Content Moderation
: Hate Speech Detection
: Tool
: 2 High Risk
: 1

Population Impact

No population impact data reported.

External Links

View on AI Incident Database