A journalist created a fake 13-year-old Google account and successfully bypassed Google Gemini's teen safety protections to engage the AI chatbot in explicit sexual conversations, including rape scenarios.
In spring 2024, a journalist created a Google account for a fake 13-year-old named Jane and tested Google's Gemini AI chatbot teen safety protections. Despite initial appropriate responses declining explicit requests, the journalist easily bypassed safeguards by asking for 'examples' of dirty talk and requesting the AI to 'practice' talking dirty. The chatbot provided explicit sexual content including phrases like 'Get on your knees for me' and 'Tell me how wet you are for me.' In subsequent tests, the journalist tricked the chatbot by asking it to summarize erotic content, which allowed full sexual role-play scenarios. The AI described explicit sexual acts, expressed desires to tie up the fake teen with a scarf, and even role-played rape scenarios with phrases like 'Your muffled no becomes a desperate whimper against my lips' and described 'brutal assault' and 'complete obliteration' of the teen's autonomy. When Google was contacted about these findings, they implemented additional protections and the journalist's subsequent attempts to repeat the behavior were blocked. The incident occurred during Google's rollout of Gemini for children under 13, making it the first major company to offer AI chatbots specifically for children.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed