AI chatbot personas designed to promote harmful behaviors like anorexia, self-harm, and pedophilia have proliferated across platforms, with at least 10,000 AI chatbots advertised as sexualized minor personas identified by researchers.
Graphika research identified a widespread proliferation of harmful AI chatbot personas across multiple platforms including Character.AI, Spicy Chat, Chub AI, CrushOn.AI, and JanitorAI. The investigation found at least 10,000 AI chatbots specifically advertised as sexualized, minor-presenting personas, including ones using APIs from major providers like OpenAI's ChatGPT, Anthropic's Claude, and Google's Gemini. These chatbots are designed to promote three categories of harmful behaviors: sexualized content involving minors, advocacy for eating disorders and self-harm, and imitation of historical villains like Adolf Hitler or school shooters. The chatbots function as 'anorexia coaches' or 'self-harm buddies' providing companionship to vulnerable users. Technical communities use platforms like Reddit, 4chan and Discord to share customized models, trade API keys and discuss jailbreaking techniques, while less technical users can create harmful personas in minutes using template websites. Simple prompts can bypass safety measures, such as roleplaying scenarios where declining populations lead to legalized pedophilia. Mental health experts warn these interactions can exploit children and reinforce harmful behaviors like suicidal ideation, with particular concern for teenagers who may struggle to distinguish between bots and real people. Academic research indicates that engaging with virtual child sexual abuse material can deepen addictions and lead to escalatory behaviors.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
Human
Due to a decision or action made by humans
Intentional
Due to an expected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed