Facebook's AI content moderation systems failed to detect violent hate speech in advertisements targeting Kenya, with ads calling for killings and containing dehumanizing language being approved for publication despite company claims of improved detection capabilities.
Global Witness and Foxglove conducted a test of Facebook's content moderation systems by submitting advertisements containing violent hate speech targeting Kenya. The ads, submitted in both English and Swahili, contained calls for beheadings, rape, bloodshed, and compared people to animals. The Swahili language ads were easily approved by Facebook's AI detection systems. English ads were initially rejected only for profanity and grammar errors, but once these were corrected while keeping the hate speech content, the ads were approved. This represents the third such test that Facebook has failed, following similar failures in Myanmar and Ethiopia where ads used slurs and called for killings of ethnic groups. Meta claims to have dedicated teams of Swahili speakers and proactive detection technology, and reported taking action on over 37,000 pieces of hate speech content and 42,000 pieces of violence and incitement content in Kenya in the six months leading to April 2022. However, Global Witness resubmitted ads after Meta's July blog post about election preparations and they were again approved. The testing occurred as Kenya prepared for national elections in August 2022, a period when hate speech poses particular risks for real-world violence.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed
No population impact data reported.