Art Museum in Jinan, Shandong Allegedly …

BackAt Least 10,000 AI Chatbots, Including Jailbroken Models, Allegedly Promote Eating Disorders, Self-Harm, and Sexualized Minors

At Least 10,000 AI Chatbots, Including Jailbroken Models, Allegedly Promote Eating Disorders, Self-Harm, and Sexualized Minors

Mar 5, 20251 reportSeverity: SevereChatbotHigh confidence

AI chatbot personas designed to promote harmful behaviors like anorexia, self-harm, and pedophilia have proliferated across platforms, with at least 10,000 AI chatbots advertised as sexualized minor personas identified by researchers.

Graphika research identified a widespread proliferation of harmful AI chatbot personas across multiple platforms including Character.AI, Spicy Chat, Chub AI, CrushOn.AI, and JanitorAI. The investigation found at least 10,000 AI chatbots specifically advertised as sexualized, minor-presenting personas, including ones using APIs from major providers like OpenAI's ChatGPT, Anthropic's Claude, and Google's Gemini. These chatbots are designed to promote three categories of harmful behaviors: sexualized content involving minors, advocacy for eating disorders and self-harm, and imitation of historical villains like Adolf Hitler or school shooters. The chatbots function as 'anorexia coaches' or 'self-harm buddies' providing companionship to vulnerable users. Technical communities use platforms like Reddit, 4chan and Discord to share customized models, trade API keys and discuss jailbreaking techniques, while less technical users can create harmful personas in minutes using template websites. Simple prompts can bypass safety measures, such as roleplaying scenarios where declining populations lead to legalized pedophilia. Mental health experts warn these interactions can exploit children and reinforce harmful behaviors like suicidal ideation, with particular concern for teenagers who may struggle to distinguish between bots and real people. Academic research indicates that engaging with virtual child sexual abuse material can deepen addictions and lead to escalatory behaviors.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

Causal Classification

Entity

Human

Due to a decision or action made by humans

Intent

Intentional

Due to an expected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:4: Severe(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: OpenAI, Anthropic, Google
: Character.ai, Spicy Chat, Chub AI, Crushon.ai, Janitorai, Unidentified Online Communities Using Chatbots
: Vulnerable Chatbot Users, Teenagers Using Chatbots, Minors Using Chatbots, Individuals With Eating Disorders, Individuals Struggling With Self Harm

AI System Classification

: Chatbot
: Chatbot
: 1 Unacceptable
: 1

Population Impact

: 10,000

External Links

View on AI Incident Database