BackExternal assessment of testing procedure

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

External assessment of testing procedure

Uuk (2024)|LLM classified

Mitigation Taxonomy

2Organisation

2.2Risk & Assurance

2.2.2Testing & Evaluation

Red teaming, capability evaluations, adversarial testing, and performance verification.

Also in Risk & Assurance

2.2.1 Risk Assessment2.2.3 Auditing & Compliance2.2.4 Assurance Documentation

Definitionp. 10-15

Bringing in external AI evaluation firms before deployment to assess and red-team the company's execution of dangerous capabilities evaluations.

Additional Informationp. 50-64

Most experts viewed this measure positively overall, with many highlighting its potential to provide unbiased perspectives, improve risk identification, and enhance safety across various risk categories. Several experts mentioned that external assessments could catch issues internal teams might miss. Multiple experts emphasised the importance of red-teaming exercises as part of these assessments. However, numerous caveats and concerns were raised. Multiple experts questioned the current capabilities and expertise of external evaluation firms, particularly for specialised domains like chemical, biological, radiological, and nuclear (CBRN) risks. Several noted that the effectiveness would depend heavily on the quality and independence of the evaluators. Several experts expressed concern about potential conflicts of interest or the risk of evaluations becoming mere "rubber stamps." Some experts highlighted that this approach might be more effective for certain types of risks (e.g., cybersecurity) than others (e.g., broader societal impacts). Several mentioned that while helpful, external evaluations might not capture all deployment risks. Several experts suggested that this measure should complement, not replace, internal evaluation capabilities. Several recommended government or multi-stakeholder oversight of the evaluation process. Single experts raised various other points, including the need for clear evaluation criteria, the potential for this to create a new commercial sector similar to financial auditing, and the importance of ensuring that evaluation results are acted upon.

LLM Classification Details

Reasoning

External evaluation firm conducts red-teaming and assessment of company's dangerous capabilities evaluation procedures.

Code: 2.2.2Version: v0.5Classified: Jan 22, 2026

Other mitigations from Uuk (2024) (25)

Pre-deployment risk assessments

Comprehensive risk assessments before deployment that would assess reasonably foreseeable misuse and include dangerous capability evaluations that incorporate post-training enhancements and collaborations with domain experts. Risk assessments would inform deployment decisions.

2.2.1 Risk Assessment

Lifecycle:Plan and DesignActor:DeveloperAIRM:Map

Third party pre-deployment model audits

External pre-deployment assessment to provide a judgment on the safety of a model. Auditors, which could be governments or independent third parties, would receive access to a fine-tuning API for testing, or further appropriate technical means.

2.2.3 Auditing & Compliance

Lifecycle:Verify and ValidateActor:DeveloperAIRM:Measure

Vetted researcher access

Giving good faith, public interest evaluation researchers access to black-box research APIs that provide technical and legal safe harbours to limit barriers imposed by usage policy enforcement, logging, and stringent terms of service.

2.3.1 Deployment Management

Lifecycle:Operate and MonitorActor:DeveloperAIRM:Govern

Advanced model access for vetted external researchers

Examples of advanced access rights could include any of the following: increased control over sampling, access to fine-tuning functionality, the ability to inspect and modify model internals, access to training data, or additional features like stable model versions.

2.2.2 Testing & Evaluation

Lifecycle:Operate and MonitorActor:DeveloperAIRM:Govern

Data curation

Careful data curation prior to all development stages (including fine-tuning) to filter out high-risk content and ensure the training data is sufficiently high-quality.

1.1.1 Training Data

Lifecycle:Collect and Process DataActor:DeveloperAIRM:Manage

Harmlessness training

State-of-the-art reinforcement learning and fine-tuning techniques, such as Reinforcement Learning from Human Feedback (RLHF) or Direct Preference Optimization (DPO), to ensure models do not engage in unsafe behavior.

1.1.2 Learning Objectives

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

View all 25 mitigations from this source →

Source Document

Effective Mitigations for Systemic Risks from General-Purpose AI

Uuk, Risto; Brouwer, Annemieke; Schreier, Tim; Dreksler, Noemi; Pulignano, Valeria; Bommasani, Rishi (2024)

The systemic risks posed by general-purpose AI models are a growing concern, yet the effectiveness of mitigations remains underexplored. Previous research has proposed frameworks for risk mitigation, but has left gaps in our understanding of the perceived effectiveness of measures for mitigating systemic risks. Our study addresses this gap by evaluating how experts perceive different mitigations that aim to reduce the systemic risks of general-purpose AI models. We surveyed 76 experts whose expertise spans AI safety; critical infrastructure; democratic processes; chemical, biological, radiological, and nuclear risks (CBRN); and discrimination and bias. Among 27 mitigations identified through a literature review, we find that a broad range of risk mitigation measures are perceived as effective in reducing various systemic risks and technically feasible by domain experts. In particular, three mitigation measures stand out: safety incident reports and security information sharing, third-party pre-deployment model audits, and pre-deployment risk assessments. These measures show both the highest expert agreement ratings (>60\%) across all four risk areas and are most frequently selected in experts' preferred combinations of measures (>40\%). The surveyed experts highlighted that external scrutiny, proactive evaluation and transparency are key principles for effective mitigation of systemic risks. We provide policy recommendations for implementing the most promising measures, incorporating the qualitative contributions from experts. These insights should inform regulatory frameworks and industry practices for mitigating the systemic risks associated with general-purpose AI.

View source DOI: 10.48550/arXiv.2412.02145

Classification

AI Lifecycle Stage

Verify and Validate

Testing, evaluating, auditing, and red-teaming the AI system

Responsible Actor

Developer

Entity that creates, trains, or modifies the AI system

Other (general)

NIST AI RMF Function

Measure

Quantifying, testing, and monitoring identified AI risks