Manufacturing Robot Failure Caused Facto…

BackFacebook’s and Twitter's Automated Content Moderation Reportedly Failed to Effectively Enforce Violation Rules for Small Language Groups

Facebook’s and Twitter's Automated Content Moderation Reportedly Failed to Effectively Enforce Violation Rules for Small Language Groups

Feb 16, 20211 reportSeverity: SevereToolMedium confidence

Social media platforms Facebook and Twitter failed to adequately moderate hate speech, harassment, and violent content in Balkan languages, with AI content moderation systems performing poorly on non-English content and leaving harmful material online despite user reports.

A study by the Balkan Investigative Reporting Network (BIRN) examined content moderation effectiveness on Facebook and Twitter for hate speech, harassment, and violent content in Balkan languages. The investigation found that while 57% of hate speech reports resulted in violation confirmations, 28% were deemed non-violations, and many accounts confirmed as violating rules remained online. For targeted harassment, 50% received violation confirmations while 16% were told content did not violate rules. For threatening violence, only 40% received violation confirmations while 60% received only acknowledgment. One respondent reported seven accounts for hate and violent content - though Twitter confirmed violations, six accounts remained available online. Facebook's proactive hate speech detection improved from 23.6% in late 2017 to 95% currently, but no language-specific data was provided. The platforms rely on AI systems for content moderation, but experts noted these perform poorly on non-English languages, particularly those using non-Roman scripts. A specific incident occurred in May 2018 when Facebook blocked Bosnian journalist Dragan Bursac for 24 hours after posting a historical photo of a detention camp, determining it violated community standards. The study highlighted that smaller language groups like those in the former Yugoslavia lack sufficient user numbers to incentivize investment in human moderation, leading to inadequate AI-only approaches for these languages.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

7AI System Safety, Failures & Limitations

7.3Lack of capability or robustness

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:4: Severe(Harm to Civil Rights, inferred)

National Security Assessment

Overall Score

Stakeholders

: Facebook, Twitter
: Facebook, Twitter
: Facebook Users Of Small Language Groups, Twitter Users Of Small Language Groups

AI System Classification

: Content Moderation
: Hate Speech Detection
: Tool
: 2 High Risk
: 1

Population Impact

: 100
: 1,000

External Links

View on AI Incident Database