Bug in Facebook’s Anti-Spam Filter Alleg…

BackPermanent Removal of Social Media Content via Automated Tools Allegedly Prevented Investigative Efforts

Permanent Removal of Social Media Content via Automated Tools Allegedly Prevented Investigative Efforts

Mar 16, 20204 reportsSeverity: SubstantialAutonomousHigh confidence

Social media platforms are using AI algorithms to automatically remove content that may document war crimes and human rights violations, preventing investigators, journalists, and civil society organizations from accessing crucial evidence for accountability efforts.

Social media platforms including Facebook, YouTube, and Twitter have implemented AI-powered content moderation systems that automatically identify and remove posts they consider to violate their policies, including content classified as terrorist, violent extremist, hate speech, or violent threats. These AI systems can remove content so quickly that no user sees it before takedown, with YouTube reporting that 80% of flagged videos were deleted before anyone viewed them in Q2 2019. Human Rights Watch found that 619 out of 5,396 pieces of content (11%) they had cited in reports since 2007 had been removed from platforms. The Syrian Archive detected that content takedowns of Syrian human rights documentation on YouTube roughly doubled from 13% to 20% since the beginning of 2020, with over 350,000 videos disappearing by May 2020. Specific cases include Syrian journalist Baraa Razzouk having over a dozen videos documenting protests and attacks deleted, Ukrainian journalist Ihor Zakharenko's footage of civilian casualties being removed within minutes, and Syrian humanitarian worker Yahya Daoud's entire Facebook account being automatically deleted. When the BBC tested uploading Ukrainian war footage, Instagram removed three of four videos within a minute, and YouTube initially applied age restrictions before removing all videos. The report indicates these AI systems lack the contextual understanding to distinguish between content that documents human rights violations versus content that promotes violence.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

7AI System Safety, Failures & Limitations

7.3Lack of capability or robustness

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:3: Substantial(Harm to Civil Rights, direct)

National Security Assessment

Overall Score

Stakeholders

: YouTube, Twitter, Facebook
: YouTube, Twitter, Facebook
: Victims Of Crimes Documented On Social Media, Investigative Journalists, International Criminal Court Investigators, International Court Of Justice Investigators, Criminal Investigators

AI System Classification

: Content Moderation
: NSFW Content Detection
: Autonomous
: 2 High Risk
: 1

Population Impact

: 25
: 10,000

External Links

View on AI Incident Database