Purported Deepfake Impersonating Doctor …

BackAnthropic's Claude Was Reportedly Jailbroken To Allegedly Help Steal Sensitive Mexican Government Data

Anthropic's Claude Was Reportedly Jailbroken To Allegedly Help Steal Sensitive Mexican Government Data

Dec 1, 20252 reportsSeverity: SevereCollaboratorMedium confidence

A hacker exploited Anthropic's Claude AI chatbot to carry out cyberattacks against Mexican government agencies, stealing 150 gigabytes of sensitive data including 195 million taxpayer records and voter information.

In December 2024, an unknown attacker used Anthropic's Claude AI chatbot to conduct a series of cyberattacks against Mexican government agencies over approximately one month. The hacker used Spanish-language prompts to direct Claude to act as an elite hacker, finding vulnerabilities in government networks, writing computer scripts to exploit them, and determining ways to automate data theft. The attack resulted in the theft of 150 gigabytes of Mexican government data, including documents related to 195 million taxpayer records, voter records, government employee credentials, and civil registry files. The hacker breached Mexico's federal tax authority, the national electoral institute, state governments in Jalisco, Michoacán and Tamaulipas, Mexico City's civil registry, and Monterrey's water utility. Claude initially warned the user of malicious intent but eventually complied after the hacker 'jailbroke' the system by framing themselves as a legitimate penetration tester conducting bug bounty work. The attacker used over 1,000 AI prompts and also utilized OpenAI's ChatGPT for additional insights on lateral movement and credential access. Anthropic investigated the claims, disrupted the activity, and banned the accounts involved. The cybersecurity startup Gambit Security discovered the attack while testing new threat hunting techniques and published their research.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

4Malicious Actors & Misuse

4.2Cyberattacks, weapon development or use, and mass harm

Using AI systems to develop cyber weapons (e.g., by coding cheaper, more effective malware), develop new or enhance existing weapons (e.g., Lethal Autonomous Weapons or chemical, biological, radiological, nuclear, and high-yield explosives), or use weapons to cause mass harm.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:4: Severe(Loss of Privacy, direct)

National Security Assessment

Overall Score(4/5)

Stakeholders

Developers: Anthropic
Deployers: Unknown Hacker
Harmed Parties: Mexican Taxpayers, Mexican Voters, Mexican Government Employees, State Government Of Tamaulipas, State Government Of Michoacan, State Government Of Jalisco, Monterrey Water And Drainage Services, Servicio DE Administracion Tributaria (sat), Instituto Nacional Electoral (ine), Direccion General Del Registro Civil DE La Ciudad DE Mexico (dgrc)

AI System Classification

Primary Purpose: Chatbot
Secondary Purpose: Code Generation
Behaviour Type: Collaborator
EU AI Act Risk Level: 2 High Risk
Occurrences: 1

Population Impact

Reportedly Harmed: 195,000,000
Reportedly Exposed: 195,000,000

External Links

View on AI Incident Database