GPT-2 Able to Recite PII in Training Dat…

BackYouTube's Algorithms Failed to Remove Violating Content Related to Suicide and Self-Harm

YouTube's Algorithms Failed to Remove Violating Content Related to Suicide and Self-Harm

Feb 4, 20193 reportsSeverity: SubstantialToolHigh confidence

YouTube's recommendation algorithm was found to be suggesting videos containing graphic self-harm content to users as young as 13, with at least a dozen such videos remaining on the platform despite policies prohibiting content that promotes self-harm or suicide.

YouTube's content moderation and recommendation algorithms failed to prevent the distribution of harmful self-harm content to minors. The Telegraph investigation found that YouTube was actively recommending videos containing graphic images of self-harm to users as young as 13 years old. At least a dozen videos featuring graphic self-harm content were discovered on the platform, including one titled 'My huge extreme self-harm scars' that had accumulated nearly 400,000 views over two years. The platform's search functionality was also suggesting concerning terms like 'how to self-harm tutorial,' 'self-harming girls,' and 'self-harming guide.' While YouTube removed some flagged content and search suggestions after being notified by The Telegraph, other videos remained accessible. This incident represents a broader pattern of YouTube's ongoing struggles with content moderation, following previous issues with child exploitation videos, dangerous challenge content, and conspiracy theories. UK government officials, including Health Secretary Matt Hancock, have called for stronger action from social media companies to protect vulnerable users from harmful content.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:3: Substantial(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: YouTube
: YouTube
: YouTube Young Users, YouTube Users

AI System Classification

: Content Recommendation
: Content Moderation
: Tool
: 2 High Risk
: 1

Population Impact

: 400,000

External Links

View on AI Incident Database