Receive, analyse, and disseminate threat…

BackConduct evaluations, red-teaming and adversarial testing

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

Conduct evaluations, red-teaming and adversarial testing

Somai (2025)|LLM classified

Mitigation Taxonomy

2Organisation

2.2Risk & Assurance

2.2.2Testing & Evaluation

Red teaming, capability evaluations, adversarial testing, and performance verification.

Also in Risk & Assurance

2.2.1 Risk Assessment2.2.3 Auditing & Compliance2.2.4 Assurance Documentation

Additional Informationp. 4

Stage: Detection; Stakeholder: Third Party Researchers; Additional information: Techniques that monitor AI model internals in addition to outputs have shown promise in detecting deception (Goldowsky-Dill et al. 2025). Developers and researchers should continue improving adversarial techniques, sharing results and developing standardised benchmarks to assess autonomy and other capabilities (Barnett & Thiergart 2024; Greenblatt, Shlegeris et al. 2024).

LLM Classification Details

Reasoning

Red teaming and adversarial testing evaluate system capabilities and identify vulnerabilities through structured assessment.

Code: 2.2.2Version: v0.5Classified: Jan 22, 2026

Other mitigations from Somai (2025) (38)

Monitor critical capability levels

2.2.2 Testing & Evaluation

Lifecycle:Other (multiple stages)Actor:DeveloperAIRM:Measure

Identify early warning signs and emergent capabilities

2.2.1 Risk Assessment

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Measure

Establish standardised benchmarks and reporting

3.2.1 Benchmarks & Evaluation

Lifecycle:Verify and ValidateActor:DeveloperAIRM:Measure

Implement compute monitoring and anomaly detection

1.2.3 Monitoring & Detection

Lifecycle:Other (multiple stages)Actor:Infrastructure ProviderAIRM:Measure

Enhance hardware and supply chain oversight

2.3.3 Monitoring & Logging

Lifecycle:Other (stage not listed)Actor:Infrastructure ProviderAIRM:Govern

Lead efforts to establish shared criteria for AI LOC

3.2.2 Technical Standards

Lifecycle:Plan and DesignActor:Governance ActorAIRM:Govern

View all 38 mitigations from this source →

Source Document

Strengthening Emergency Preparedness and Response for AI Loss of Control Incidents

Somani, Elika; Friedman, Anjay; Wu, Henry; Lu, Marianne; Byrd, Christopher; van Soest, Henri; Zakaria, Sana (2025)

As artificial intelligence (AI) systems become increasingly embedded in essential infrastructure and services, the risks associated with unintended failures rise. Developing comprehensive emergency response protocols could help mitigate these significant risks. This report focuses on understanding and addressing AI loss of control (LOC) scenarios where human oversight fails to adequately constrain an autonomous, general-purpose AI.

View source DOI: 10.7249/RRA3847-1

Classification

AI Lifecycle Stage

Verify and Validate

Testing, evaluating, auditing, and red-teaming the AI system

Responsible Actor

Other

Actor type not captured by the standard categories

NIST AI RMF Function

Measure

Quantifying, testing, and monitoring identified AI risks