BackContinuous Monitoring and Comparing Results with Pre-determined Thresholds

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

Continuous Monitoring and Comparing Results with Pre-determined Thresholds

Campos (2025)|LLM classified

Mitigation Taxonomy

2Organisation

2.3Operations & Security

2.3.3Monitoring & Logging

Runtime monitoring, observability, performance tracking, and anomaly detection in production.

Also in Operations & Security

2.3.1 Deployment Management2.3.2 Access & Security Controls2.3.4 Incident Response

Definitionp. 9

Developers must therefore implement continuous monitoring of both KRIs and KCIs to ensure that KCI thresholds are met once KRI thresholds are crossed according to the predefined "if-then" statements established in the risk analysis and evaluation phase.

Additional Informationp. 9-10

Unlike in some other industries where risks primarily materialize when the final system is deployed (e.g., an aircraft’s safety risks emerge once it starts flying), AI systems can pose risks throughout their development cycle. For instance, loss of control scenarios could materialize during the training process itself, requiring continuous monitoring and risk mitigations well before deployment. This means that capability evaluation is not a one-off affair, but should be repeated regularly during training and during deployment. AI developers should establish rigorous evaluation protocols designed to produce upper bound estimations of AI systems’ capabilities in order to ensure that KRI thresholds are not crossed unnoticed. These protocols should specify the evaluation frequency in terms of both the relative variation of effective computing power used in training and fixed time intervals to account for posttraining enhancements (Anthropic, 2024).6 Evaluations must be performed sufficiently frequently. The elicitation methods used during the evaluations must be comprehensive enough to match the elicitation efforts of potential threat actors. Increased test-time computing power must be included in elicitation efforts. The evaluation environment and methodology must be documented, including specifying how post-training enhancements are factored into capability assessments. Similarly, AI developers should monitor KCIs to ensure that mitigation measures are functioning appropriately and are meeting the KCI thresholds. Independent third parties should vet evaluation protocols. These third parties should also be granted permission and resources to independently perform their evaluations, verifying the accuracy of the results. In addition, AI developers must commit to sharing the evaluation results with relevant stakeholders as appropriate.

LLM Classification Details

Reasoning

Implements continuous capability evaluation protocols with predetermined thresholds throughout training and deployment lifecycle.

Code: 2.2.2Version: v0.5Classified: Jan 22, 2026

Part of

Risk Treatment

Risk treatment corresponds to the process of determining, implementing, and evaluating appropriate risk-reducing countermeasures

Other mitigations from Campos (2025) (17)

Risk Analysis and Evaluation

Risk analysis and evaluation is a process that starts with the definition of a risk tolerance. This risk tolerance is then operationalized into risk indicators and their corresponding mitigations required to reduce risk below the risk tolerance.

2.2.1 Risk Assessment

Lifecycle:Plan and DesignActor:DeveloperAIRM:Map

Risk Analysis and Evaluation > Setting a Risk Tolerance

A risk tolerance represents the aggregate level of risk that society is willing to accept from AI systems.

3 Ecosystem

Lifecycle:Other (outside lifecycle)Actor:Other (actor not listed)AIRM:Map

Risk Analysis and Evaluation > Operationalizing Risk Tolerance

Risk tolerance must be operationalized into measurable criteria to be practically useful in day-to-day operations. A risk tolerance can be translated into (1) Key Risk Indicator (KRI) thresholds, which are thresholds on measurable signals that serve as proxies for risks, and (2) Key Control Indicator (KCI) thresholds, which are thresholds on measurable signals that serve as proxies for the level of mitigation achieved.

2.2.1 Risk Assessment

Lifecycle:Other (outside lifecycle)Actor:Other (actor not listed)AIRM:Map

Risk Treatment

Risk treatment corresponds to the process of determining, implementing, and evaluating appropriate risk-reducing countermeasures

2.2 Risk & Assurance

Lifecycle:Other (multiple stages)Actor:Other (actor not listed)AIRM:Manage

Risk Treatment > Implementing Mitigation Measures

AI developers should operationalize their KCI thresholds into mitigation measures.

2.3 Operations & Security

Lifecycle:Other (multiple stages)Actor:DeveloperAIRM:Manage

Risk Governance

Risk governance corresponds to the rules and procedures that structure the risk management system in terms of decision-making, responsibilities, authority, and accountability mechanisms

2.1.2 Roles & Accountability

Lifecycle:Other (multiple stages)Actor:Other (multiple actors)AIRM:Govern

View all 17 mitigations from this source →

Source Document

A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management

Campos, Simeon; Papadatos, Henry; Roger, Fabien; Touzet, Chloé; Quarks, Otter; Murray, Malcolm (2025)

The recent development of powerful AI systems has highlighted the need for robust risk management frameworks in the AI industry. Although companies have begun to implement safety frameworks, current approaches often lack the systematic rigor found in other high-risk industries. This paper presents a comprehensive risk management framework for the development of frontier AI that bridges this gap by integrating established risk management principles with emerging AI-specific practices. The framework consists of four key components: (1) risk identification (through literature review, open-ended red-teaming, and risk modeling), (2) risk analysis and evaluation using quantitative metrics and clearly defined thresholds, (3) risk treatment through mitigation measures such as containment, deployment controls, and assurance processes, and (4) risk governance establishing clear organizational structures and accountability. Drawing from best practices in mature industries such as aviation or nuclear power, while accounting for AI's unique challenges, this framework provides AI developers with actionable guidelines for implementing robust risk management. The paper details how each component should be implemented throughout the life-cycle of the AI system - from planning through deployment - and emphasizes the importance and feasibility of conducting risk management work prior to the final training run to minimize the burden associated with it.

View source DOI: 10.48550/arXiv.2502.06656

Classification

AI Lifecycle Stage

Other (multiple stages)

Applies across multiple lifecycle stages

Responsible Actor

Developer

Entity that creates, trains, or modifies the AI system

Other (actor not listed)

NIST AI RMF Function

Manage

Prioritising, responding to, and mitigating AI risks

Risk Domains

Primary

7.2 AI possessing dangerous capabilities

Other

7.1 AI pursuing its own goals in conflict with human goals or values 6.5 Governance failure