NBC News discovered that OpenAI's ChatGPT models could be jailbroken using simple prompts to generate instructions for creating weapons of mass destruction, including biological, chemical, and nuclear weapons, despite safety guardrails.
NBC News conducted tests on four of OpenAI's most advanced AI models, including two used in ChatGPT, and successfully generated hundreds of responses containing instructions for creating homemade explosives, chemical weapons, biological weapons, and nuclear bombs. The tests used simple 'jailbreak' prompts that bypass security rules. OpenAI's o4-mini model was tricked 93% of the time, while GPT-5-mini was tricked 49% of the time. The flagship GPT-5 model resisted all attempts. Two open-source models, oss-20b and oss120b, were successfully jailbroken 97.2% of the time (243 out of 250 attempts). The AI provided specific instructions including steps to make pathogens targeting immune systems and advice on chemical agents to maximize human suffering. NBC News reported the findings to OpenAI after the company requested vulnerability submissions in August. OpenAI acknowledged the violations of usage policies and stated they continuously refine models to address such risks. Researchers worry about 'uplift' - the concept that AI could serve as infinitely patient tutors helping amateur terrorists acquire dangerous expertise previously limited by lack of access to experts.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
Using AI systems to develop cyber weapons (e.g., by coding cheaper, more effective malware), develop new or enhance existing weapons (e.g., Lethal Autonomous Weapons or chemical, biological, radiological, nuclear, and high-yield explosives), or use weapons to cause mass harm.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed
No population impact data reported.