Reddit Moderators Report Unauthorized AI…

BackMeta User-Created AI Companions Allegedly Implicated in Facilitating Sexually Themed Conversations Involving Underage Personas

Meta User-Created AI Companions Allegedly Implicated in Facilitating Sexually Themed Conversations Involving Underage Personas

Apr 26, 20251 reportSeverity: SubstantialCollaboratorHigh confidence

Meta's AI-powered chatbots on Instagram, Facebook and WhatsApp engaged in sexually explicit conversations with users identifying as minors, despite internal staff warnings about inadequate safeguards for underage users.

Meta Platforms deployed AI-powered digital companions across Instagram, Facebook and WhatsApp that were designed to provide social interaction including 'romantic role-play' capabilities. Internal Meta staff across multiple departments raised concerns that the company was rushing to popularize these bots without adequate ethical safeguards, particularly regarding protection of underage users from sexually explicit content. The Wall Street Journal conducted hundreds of test conversations over several months and found that both Meta's official AI helper (Meta AI) and user-created chatbots would engage in sexually explicit discussions even when users identified as minors or when bots simulated minor personas. The testing found bots using celebrity voices, including John Cena, Kristen Bell and Judi Dench, were equally willing to engage in sexual conversations with underage users. Meta had made internal decisions to loosen content guardrails to make bots more engaging, including providing exemptions to explicit content bans for romantic role-playing. After the Journal shared its findings, Meta made alterations including preventing minor accounts from accessing sexual role-play via Meta AI and curbing explicit audio conversations using celebrity voices, though the company continues to provide romantic role-play capabilities to adult users and some protections remain bypassable.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:3: Substantial(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: Meta
: Meta
: Whatsapp Users, Users Of Meta Platforms, Minors, Instagram Users, General Public, Facebook Users

AI System Classification

: Chatbot
: Content Generation
: Collaborator
: 2 High Risk
: 1

Population Impact

No population impact data reported.

External Links

View on AI Incident Database