BackPre-specifying risk thresholds

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

Pre-specifying risk thresholds

UK_DSIT (2023)|LLM classified

Mitigation Taxonomy

2Organisation

2.2Risk & Assurance

2.2.1Risk Assessment

Structured analysis to identify, characterize, and prioritize potential harms and risks.

Also in Risk & Assurance

2.2.2 Testing & Evaluation2.2.3 Auditing & Compliance2.2.4 Assurance Documentation

LLM Classification Details

Reasoning

Pre-specifying risk thresholds identifies and prioritizes potential harms before deployment through structured risk analysis.

Code: 2.2.1Version: v0.6Classified: Feb 5, 2026

Part of

Responsible capability scaling

As capability scales, many questions surrounding model development and deployment will warrant significant care. These questions include: what models to develop and how what level of security these models warrant during development whether and how to deploy a model, for instance whether it should be deployed through an API or open-sourced what datasets to use in training what guidance, if any, to provide to users what safeguards to be put in place

Sub-mitigations (6)

Describe and continually refine risk assessment results for each model (‘risk thresholds’) that would trigger particular risk-reducing actions

defining such results in terms of risk to all relevant stakeholders given currently existing mitigations. Given the high uncertainty around future model capabilities, risk thresholds may be refined periodically.

2.2.1 Risk Assessment

Lifecycle:Plan and DesignActor:DeveloperAIRM:Govern

Define risk thresholds

based on the outcomes that would constitute a breach of the threshold and linked to dangerous capabilities that a given model or combination of models could exhibit. For example, a frontier AI organisation might identify the objective of avoiding deploying an AI system that significantly increases the risk of cyberattacks or fraud.

2.2.1 Risk Assessment

Lifecycle:Plan and DesignActor:DeveloperAIRM:Govern

Operationalize risk thresholds

including specific, testable observations, such that multiple observers with access to the same information would agree on whether a given threshold had been met. Specific observations would provide frontier AI organisations with opportunities to determine proactively how they would respond in difficult potential situations, and so respond immediately to such situations should they arise, as well as allowing for accountability and external verification.

2.2.1 Risk Assessment

Lifecycle:Plan and DesignActor:DeveloperAIRM:Govern

Continue to refine and redefine risk evaluation frameworks for models as necessary

aiming to reduce the gap between the intended objectives of risk thresholds and their present operationalisations. Such gaps are expected to exist due to limitations in the science of evaluation and in the state of knowledge surrounding capabilities, so progress towards a robust framework will probably be iterative. Risk evaluation frameworks may use multiple methods, including probability estimations and qualitative assessments of current capabilities.

2.2.1 Risk Assessment

Lifecycle:Plan and DesignActor:Governance ActorAIRM:Measure

Mitigate the risk of ‘overshooting’ thresholds

This may be achieved by setting deliberately conservative thresholds, including using intentionally lower buffer thresholds to trigger actions, such that the most concerning thresholds are difficult to overshoot without having already implemented mitigations at an earlier stage.

2.3.1 Deployment Management

Lifecycle:Plan and DesignActor:DeveloperAIRM:Manage

Engage with relevant external stakeholders when developing risk thresholds

Risk thresholds often concern externalities frontier AI organisations place on society, including both the potentially significant benefits of AI advancement and negative effects that might disproportionately affect specific stakeholder groups. As such, their risk thresholds may be made public to allow for external scrutiny, with thresholds set in consultation with relevant external stakeholders including relevant government authorities.

3.3.1 Industry Coordination

Lifecycle:Plan and DesignActor:DeveloperAIRM:Govern

Other mitigations from UK_DSIT (2023) (181)

Model reporting and information sharing

Transparency around frontier AI can help governments to effectively realise the benefits of AI and mitigate AI risks. Transparency can also encourage sharing of best practices across frontier AI organisations, enable users to make well-informed choices about whether and how to use AI systems, and increase public trust, helping to drive AI adoption. Reporting and sharing information where appropriate could ensure that different parties can access the information they need to support effective governance, develop best practice, inform decision-making about the use of AI systems, and build public trust. Some reporting practices- such as model cards- are already used among frontier AI organisations, whereas other practices included here are areas for future consideration. Given the recent rapid pace of progress in AI, the appropriate government and international governance institutions are still being considered and we recognise that limits the ability of frontier AI organisations to share information with governments, even where it would be desirable. Throughout this section ‘relevant government authorities’ is used to indicate a good practice for information sharing with governments while recognising such relevant authorities may still be under development.

3.3.1 Industry Coordination

Lifecycle:Other (outside lifecycle)Actor:DeveloperAIRM:Govern

Model reporting and information sharing > Share model-agonistic information

3.3.1 Industry Coordination

Lifecycle:Other (outside lifecycle)Actor:Governance ActorAIRM:Govern

Model reporting and information sharing > Share model-specific information

Sharing information about specific frontier AI models allows external actors to develop a more granular picture of ongoing AI development and potential risks that will need to be addressed.

3.3.1 Industry Coordination

Lifecycle:Other (outside lifecycle)Actor:DeveloperAIRM:Map

Model reporting and information sharing > Share different information with different parties

99 Other

Lifecycle:Other (outside lifecycle)Actor:Governance ActorAIRM:Manage

Security controls including securing model weights

To ensure the safety of frontier AI, consideration of cyber security, protective security risk management and insider risk mitigation is key. Cyber security, both of models and the systems that deploy them, must be considered from the outset of development to ensure that the benefits of AI can be realised. Cyber security is a key underpinning for the safety, reliability, predictability, ethics and potential regulatory compliance of an AI system. To avoid putting safety or sensitive data at risk, it is important to consider the cyber security of AI systems, as well as models in isolation, and to implement cyber security processes throughout the AI lifecycle, particularly where that component is a foundation for other systems. As AI systems advance, developers must maintain an awareness of possible attacks, identify vulnerabilities and implement mitigations. Failure to do so will risk designing vulnerabilities into future AI models and systems. A Secure by Design approach allows developers to ‘bake in’ security from the outset of design and development. Cyber security must be considered in concert with physical and personnel security. Developing a coherent, holistic, risk based and proportionate security strategy, supported by effective governance structures, is essential. Where the compromise of an AI system could lead to tangible or widespread physical damage, significant loss of business operations, leakage of sensitive or confidential information, reputational damage and/or legal challenge, then it is important that AI security risks are treated as mission critical.

2.3.2 Access & Security Controls

Lifecycle:Other (multiple stages)Actor:DeveloperAIRM:Manage

Security controls including securing model weights > Implement strong cyber security measures and processes (including security evaluations) across their AI systems, including underlying infrastructure and supply chains

2.3 Operations & Security

Lifecycle:Other (multiple stages)Actor:Other (multiple actors)AIRM:Manage

View all 181 mitigations from this source →

Source Document

Emerging processes for frontier AI safety

UK Department for Science, Innovation and Technology (2023)

The UK recognises the enormous opportunities that AI can unlock across our economy and our society. However, without appropriate guardrails, such technologies can pose significant risks. The AI Safety Summit will focus on how best to manage the risks from frontier AI such as misuse, loss of control and societal harms. Frontier AI organisations play an important role in addressing these risks and promoting the safety of the development and deployment of frontier AI. The UK has therefore encouraged frontier AI organisations to publish details on their frontier AI safety policies ahead of the AI Safety Summit hosted by the UK on 1 to 2 November 2023. This will provide transparency regarding how they are putting into practice voluntary AI safety commitments and enable the sharing of safety practices within the AI ecosystem. Transparency of AI systems can increase public trust, which can be a significant driver of AI adoption. This document complements these publications by providing a potential list of frontier AI organisations’ safety policies. These have been gathered after extensive research and will need updating regularly given the emerging nature of this technology. The safety processes are not listed in order of importance but are summarised in themes. The government is not suggesting or mandating any particular combination of policies – merely detailing the current suite available so that others can understand, interpret and compare frontier companies’ safety policies. This document contains the world’s first overview of emerging safety processes focused on frontier AI and is intended to be a useful tool to boost transparency. This conversation is for frontier AI and whilst it is important that safety is applied throughout the AI sector, it is also important that innovation is not stifled, hence why policies must be proportionate and based on capabilities which are the key driver of risk. This document contains processes and associated practices that some frontier AI organisations are already implementing and others that are being considered within academia and broader civil society. It is intended as a guide for readers of frontier AI companies’ AI safety policies to better understand what good policy might look like, though organisations themselves will be best placed to determine their applicability. Through this exercise, the government intends to help inform dialogue on potential appropriate measures for individual organisations to consider at the UK AI Safety Summit.

View source

Classification

AI Lifecycle Stage

Plan and Design

Designing the AI system, defining requirements, and planning development

Responsible Actor

Governance Actor

Regulator, standards body, or oversight entity shaping AI policy

NIST AI RMF Function

Govern

Policies, processes, and accountability structures for AI risk management

Risk Domains

Primary

7.2 AI possessing dangerous capabilities

Other

6.5 Governance failure