Facebook's automated content moderation system approved 15 out of 20 test advertisements containing explicit death threats against election workers, while TikTok and YouTube rejected all such ads and suspended the test accounts.
In November 2022, researchers from Global Witness and NYU's Cybersecurity for Democracy tested Facebook's, TikTok's, and YouTube's ability to detect violent content by submitting 20 advertisements containing death threats against election workers around the US midterm elections. The ads contained real examples of threats that had been reported in the media, including statements about killing, hanging, executing, and molesting children, with threats submitted in both English and Spanish. Facebook's automated moderation system approved 15 of the 20 ads (9 out of 10 English ads and 6 out of 10 Spanish ads) for publication, while TikTok and YouTube rejected all ads and suspended the researcher accounts for policy violations. The researchers removed the approved ads before they went live to prevent spreading violent content. The test used clear, unambiguous language sourced from actual threats against election workers, with profanity removed and grammar corrected. Meta responded that this was a small sample not representative of what users see and claimed their moderation capabilities exceed other platforms, though they could not provide evidence supporting this claim when requested.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed
No population impact data reported.