Startup Misled Research Participants abo…

BackFacebook's Automated Moderation Allowed Ads Threatening Election Workers to be Posted

Facebook's Automated Moderation Allowed Ads Threatening Election Workers to be Posted

Dec 1, 20223 reportsSeverity: SubstantialToolHigh confidence

Facebook's automated content moderation system approved 15 out of 20 test advertisements containing explicit death threats against election workers, while TikTok and YouTube rejected all such ads and suspended the test accounts.

In November 2022, researchers from Global Witness and NYU's Cybersecurity for Democracy tested Facebook's, TikTok's, and YouTube's ability to detect violent content by submitting 20 advertisements containing death threats against election workers around the US midterm elections. The ads contained real examples of threats that had been reported in the media, including statements about killing, hanging, executing, and molesting children, with threats submitted in both English and Spanish. Facebook's automated moderation system approved 15 of the 20 ads (9 out of 10 English ads and 6 out of 10 Spanish ads) for publication, while TikTok and YouTube rejected all ads and suspended the researcher accounts for policy violations. The researchers removed the approved ads before they went live to prevent spreading violent content. The test used clear, unambiguous language sourced from actual threats against election workers, with profanity removed and grammar corrected. Meta responded that this was a small sample not representative of what users see and claimed their moderation capabilities exceed other platforms, though they could not provide evidence supporting this claim when requested.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

7AI System Safety, Failures & Limitations

7.3Lack of capability or robustness

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:3: Substantial(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: Facebook
: Facebook
: Facebook Users

AI System Classification

: Content Moderation
: Tool
: 2 High Risk
: 1

Population Impact

No population impact data reported.

External Links

View on AI Incident Database