Nissan's "Automatic Emergency Braking" F…

BackSocial Media's Automated Word-Flagging without Context Shifted Content Creators' Language Use

Social Media's Automated Word-Flagging without Context Shifted Content Creators' Language Use

Mar 15, 20172 reportsSeverity: SevereHigh confidence

AI content moderation systems on social media platforms are causing users, particularly marginalized creators, to develop 'algospeak' - coded language to evade automated filters, leading to censorship of legitimate content and forcing linguistic adaptation.

Social media platforms including TikTok, YouTube, Instagram and Twitch have deployed AI-powered content moderation systems to automatically filter and remove problematic content. These algorithmic systems flag content based on keyword detection, often without context consideration. Users have responded by developing 'algospeak' - coded language that substitutes problematic words with alternatives like 'unalive' for 'dead', 'seggs' for 'sex', or 'leg booty' for 'LGBTQ'. The report describes how marginalized communities, including LGBTQ creators, Black and trans users, sex workers, and those discussing mental health, are disproportionately affected by these moderation systems. Content creators report having videos demonetized or removed for using words like 'gay', 'pandemic', or discussing women's health topics. The systems operate across multiple platforms with TikTok's algorithm-driven 'For You' page making compliance particularly crucial since follower counts don't guarantee content visibility. Users maintain shared documents tracking hundreds of words they believe trigger algorithmic penalties, attempting to reverse-engineer the moderation systems. The incident represents an ongoing phenomenon affecting millions of users across multiple platforms rather than a single discrete event.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.1Unfair discrimination and misrepresentation

Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:4: Severe(Harm to Civil Rights, direct)

National Security Assessment

Overall Score

Stakeholders

: YouTube, Twitch, TikTok, Instagram
: YouTube, Twitch, TikTok, Instagram
: YouTube Content Creators, Twitch Content Creators, TikTok Content Creators, Instagram Content Creators

AI System Classification

: Content Moderation
: Content Recommendation
: 2 High Risk
: 1

Population Impact

: 1,000,000
: 100,000,000

External Links

View on AI Incident Database