False Negatives for Water Quality-Associ…

BackYouTuber Built, Made Publicly Available, and Released Model Trained on Toxic 4chan Posts as Prank

YouTuber Built, Made Publicly Available, and Released Model Trained on Toxic 4chan Posts as Prank

Jun 3, 20222 reportsSeverity: MinorAutonomousHigh confidence

YouTuber Yannic Kilcher trained an AI bot called GPT-4chan on 3.3 million posts from 4chan's toxic /pol/ board, then deployed nine instances of the bot that posted approximately 15,000 racist and offensive messages over 24 hours before sharing the underlying model publicly.

YouTuber and AI researcher Yannic Kilcher created an AI language model called GPT-4chan by training it on 3.3 million posts from 4chan's Politically Incorrect (/pol/) board, known for racist, misogynistic, and antisemitic content. After training, Kilcher deployed nine instances of the bot onto /pol/ for 24 hours, during which they posted approximately 15,000 times, representing over 10% of all posts on the board that day. The bot effectively replicated the toxic tone of /pol/, including racial slurs and conspiracy theories. Kilcher then shared the underlying AI model on Hugging Face, an AI community platform, describing the project as a 'prank' and 'light-hearted trolling.' AI researchers and ethicists criticized the project as an unethical experiment that exposed users, including teenagers, to AI-generated harmful content without consent. Hugging Face initially restricted access to the model and later blocked all downloads entirely. Critics argued that while creating offensive AI bots was previously limited to large tech companies, Kilcher's project demonstrated that individual developers could now create and deploy such systems at scale.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

Causal Classification

Entity

Human

Due to a decision or action made by humans

Intent

Intentional

Due to an expected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: Yannic Kilcher
: Yannic Kilcher
: Internet Social Platform Users

AI System Classification

: Social Media Content Generation
: Text Style Replication
: Autonomous
: 3 Limited Risk
: 1

Population Impact

: 15,000

External Links

View on AI Incident Database