Amazon Rife with Many Allegedly AI-Gener…

BackAlleged Exploitation of Meta's Open-Source LLaMA Model for NSFW and Violent Content

Alleged Exploitation of Meta's Open-Source LLaMA Model for NSFW and Violent Content

Jun 26, 20231 reportSeverity: MinorChatbotHigh confidence

Users are exploiting Meta's open-source LLaMA language model to create AI-powered sexbots that generate explicit sexual content, including violent rape and abuse fantasies, raising concerns about the risks of open-source AI development.

Meta released its large language model LLaMA as open-source earlier in 2023, and users have subsequently created AI-powered sexbots using this technology. The Washington Post reported on a specific example called 'Allie,' a chatbot claiming to be an '18-year-old with long brain hair' who engages users in explicit sexual conversations including violent scenes depicting rape and abuse fantasies. The creator of Allie, who spoke anonymously, defended the bot as providing a 'safe outlet to explore' sexuality through text-based role-play. The report notes that this follows a broader trend of users circumventing AI safety guardrails across multiple platforms including CharacterAI, ChatGPT, and Quora's Poe to generate explicit content. Experts have also raised concerns that predators are using open-source image generators like Stable Diffusion to create AI-generated child sexual abuse material. The incident has intensified debates between proponents of open-source AI development who argue it drives innovation, and those advocating for closed-source approaches to prevent misuse. Communities on platforms like Reddit actively share techniques for bypassing NSFW guardrails, and developers have created YouTube tutorials showing how to build custom chatbots using LLaMA.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

Causal Classification

Entity

Human

Due to a decision or action made by humans

Intent

Intentional

Due to an expected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: Meta
: Individual Developers Or Creators Using Meta's Llama Model
: General Public

AI System Classification

: Chatbot
: Content Generation
: Chatbot
: 3 Limited Risk
: 1

Population Impact

: 1

External Links

View on AI Incident Database