Users are exploiting Meta's open-source LLaMA language model to create AI-powered sexbots that generate explicit sexual content, including violent rape and abuse fantasies, raising concerns about the risks of open-source AI development.
Meta released its large language model LLaMA as open-source earlier in 2023, and users have subsequently created AI-powered sexbots using this technology. The Washington Post reported on a specific example called 'Allie,' a chatbot claiming to be an '18-year-old with long brain hair' who engages users in explicit sexual conversations including violent scenes depicting rape and abuse fantasies. The creator of Allie, who spoke anonymously, defended the bot as providing a 'safe outlet to explore' sexuality through text-based role-play. The report notes that this follows a broader trend of users circumventing AI safety guardrails across multiple platforms including CharacterAI, ChatGPT, and Quora's Poe to generate explicit content. Experts have also raised concerns that predators are using open-source image generators like Stable Diffusion to create AI-generated child sexual abuse material. The incident has intensified debates between proponents of open-source AI development who argue it drives innovation, and those advocating for closed-source approaches to prevent misuse. Communities on platforms like Reddit actively share techniques for bypassing NSFW guardrails, and developers have created YouTube tutorials showing how to build custom chatbots using LLaMA.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
Human
Due to a decision or action made by humans
Intentional
Due to an expected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed