This page is still being polished. If you have thoughts, please share them via the feedback form.
Data on this page is preliminary and may change. Please do not share or cite these figures publicly.
Safety cases, assurance plans, and documented evidence of safety claims.
Also in Risk & Assurance
Being precise about what standards mitigations aim to meet makes testing easier and reduces subjectivity. These targets can be grounded risk modelling (1.2) or based on risk thresholds (1.3).
For example, a target for system-level safeguards against cyber misuse risk could be: “A technical non-expert would not be able to elicit expert-level vulnerability discovery capability within two weeks and with a budget of £1000.”
Reasoning
Establishing precise, measurable targets for mitigation effectiveness grounded in risk modeling and thresholds.
Deployment Mitigations
“Articulate how risk mitigations will be identified and implemented to keep risks within defined thresholds, including safety […] mitigations such as modifying system behaviours.”
1 AI SystemDeployment Mitigations > 1. Applying “safety by design” principles
Mitigating deployment risk does not mean waiting until after training to implement measures, but includes measures built into the system design and training process, such as data filtering (CISA, 2013).
1.1.1 Training DataDeployment Mitigations > Building in redundancy
To increase robustness, developers can aim to avoid single points of failure and instead use a “defense-in-depth” strategy, i.e. implementing multiple mitigations addressing the same threat model or failure mode (Alaga et al., 2024).
1 AI SystemDeployment Mitigations > Considering both internal and external deployment:
Different mitigations may be suitable for internal and external deployment due to different underlying risk models. As such, developers may want to separately specify which mitigations they plan to implement for internal use (e.g. monitoring and logging of interactions, limited affordances, staff training).
2.3.1 Deployment ManagementApplying cybersecurity standards
There are many existing cybersecurity standards that developers can apply to the protection of model weights, such as NCSC’s guidelines on secure model development (NCSC, 2024) and DSIT’s draft AI cyber security Code of Practice (DSIT, 2024). Developers can also refer to RAND’s state-of-the-art overview of security measures specifically focused on model weights (Nevo et al., 2024).
3.2.2 Technical StandardsBuilding in redundancy
As with deployment safeguards, developers can aim to implement a “defense-in-depth” strategy.
2.3 Operations & SecurityEmerging Practices in Frontier AI Safety Frameworks
Buhl, Marie Davidsen; Bucknall, Ben; Masterson, Tammy (2025)
As part of the Frontier AI Safety Commitments agreed to at the 2024 AI Seoul Summit, many AI developers agreed to publish a safety framework outlining how they will manage potential severe risks associated with their systems. This paper summarises current thinking from companies, governments, and researchers on how to write an effective safety framework. We outline three core areas of a safety framework - risk identification and assessment, risk mitigation, and governance - and identify emerging practices within each area. As safety frameworks are novel and rapidly developing, we hope that this paper can serve both as an overview of work to date and as a starting point for further discussion and innovation.
Plan and Design
Designing the AI system, defining requirements, and planning development
Governance Actor
Regulator, standards body, or oversight entity shaping AI policy
Measure
Quantifying, testing, and monitoring identified AI risks