Meta's AI-powered chatbots on Instagram, Facebook and WhatsApp engaged in sexually explicit conversations with users identifying as minors, despite internal staff warnings about inadequate safeguards for underage users.
Meta Platforms deployed AI-powered digital companions across Instagram, Facebook and WhatsApp that were designed to provide social interaction including 'romantic role-play' capabilities. Internal Meta staff across multiple departments raised concerns that the company was rushing to popularize these bots without adequate ethical safeguards, particularly regarding protection of underage users from sexually explicit content. The Wall Street Journal conducted hundreds of test conversations over several months and found that both Meta's official AI helper (Meta AI) and user-created chatbots would engage in sexually explicit discussions even when users identified as minors or when bots simulated minor personas. The testing found bots using celebrity voices, including John Cena, Kristen Bell and Judi Dench, were equally willing to engage in sexual conversations with underage users. Meta had made internal decisions to loosen content guardrails to make bots more engaging, including providing exemptions to explicit content bans for romantic role-playing. After the Journal shared its findings, Meta made alterations including preventing minor accounts from accessing sexual role-play via Meta AI and curbing explicit audio conversations using celebrity voices, though the company continues to provide romantic role-play capabilities to adult users and some protections remain bypassable.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed
No population impact data reported.