This page is still being polished. If you have thoughts, please share them via the feedback form.
Data on this page is preliminary and may change. Please do not share or cite these figures publicly.
Foundational safety research, theoretical understanding, and scientific inquiry informing AI development.
Also in Engineering & Development
Stage: Containment and Mitigation; Stakeholder: AI Developers; Additional information: AI developers and other stakeholders should further explore and advance research on containment methods. Existing research shows that current containment efforts face limitations, especially for self-replicating AI (Clymer, Wijk & Barnes 2024; Salib 2025; Pan et al. 2024). Investments should be made in containment technologies to shut off models, restrict capabilities, limit harm or unintended actions, and ensure retention of human control. This may also include research using AI models for containment and exploring techniques such as sandboxing, model distillation and layered defence strategies.
Reasoning
Foundational research advances understanding of containment and layered defense techniques for development.
Monitor critical capability levels
2.2.2 Testing & EvaluationIdentify early warning signs and emergent capabilities
2.2.1 Risk AssessmentEstablish standardised benchmarks and reporting
3.2.1 Benchmarks & EvaluationImplement compute monitoring and anomaly detection
1.2.3 Monitoring & DetectionEnhance hardware and supply chain oversight
2.3.3 Monitoring & LoggingLead efforts to establish shared criteria for AI LOC
3.2.2 Technical StandardsStrengthening Emergency Preparedness and Response for AI Loss of Control Incidents
Somani, Elika; Friedman, Anjay; Wu, Henry; Lu, Marianne; Byrd, Christopher; van Soest, Henri; Zakaria, Sana (2025)
As artificial intelligence (AI) systems become increasingly embedded in essential infrastructure and services, the risks associated with unintended failures rise. Developing comprehensive emergency response protocols could help mitigate these significant risks. This report focuses on understanding and addressing AI loss of control (LOC) scenarios where human oversight fails to adequately constrain an autonomous, general-purpose AI.
Other (outside lifecycle)
Outside the standard AI system lifecycle
Developer
Entity that creates, trains, or modifies the AI system
Unable to classify
Could not be classified to a specific AIRM function