This page is still being polished. If you have thoughts, please share them via the feedback form.
Data on this page is preliminary and may change. Please do not share or cite these figures publicly.
Practices for running and protecting AI systems in production, including deployment, monitoring, incident response, and security controls.
Also in Organisation
Reasoning
Mitigation name "Cybersecurity" lacks definition and evidence; insufficient detail to identify focal activity or where it occurs.
Least Privilege access
Deployers of an AI system can restrict its permissions to a whitelisted set of predetermined options, such that all options not on the whitelist are not accessible to the AI [200, 146]. The entries on the whitelist can be chosen to be as small as possible for the AI system to fulfill its intended purpose, to reduce the attack surface of external attackers, and to decrease the probability that the AI system accidentally takes actions with large unintended side-effects
1.2.2 Runtime EnvironmentProtect proprietary or unreleased AI model architecture and parameters
The developers of AI models can invest in cybersecurity to prevent compute resources, training source code, model weights, and other critical resources from being accessed and copied by unauthorized third parties (e.g., through insider threats or supply chain attacks). Access to model source code and weights can be restricted through an access control scheme, such as role-based access control. If access to model outputs by third parties is required, it can be provided through an API. Air gaps can block unauthorized remote access. In the case of necessary interaction with an external network, network bandwidth limitations can also be enforced to increase the detection window of potential breaches [108].
1.2.4 Security InfrastructureHardware limitations on data center network connections
Hardware-enforced bandwidth limitations on data center network connections can protect AI model weights from unauthorized access or exfiltration, by limiting the speed of model weight access on the connections between data centers and the outside world. Such limitations can be put in place in multiple ways, for example by only constructing connections with a specific bandwidth. The output rate on all data channels can be set low enough that copying the weights is possible in principle (e.g., to enable regular backups), but would take so long that an unauthorized exfiltration of the weights could be detected and prevented. Such rate-limiting is only effective if it applies to all output connections for all storage locations on which the weights of the model are stored [139].
1.2.4 Security InfrastructureStructured access to a model
Structured access refers to methods which limit users’ or deployers’ direct access to a model’s parameters by constraining access to a model through a centralized access point (e.g., an API) [88]. This access point can be monitored for usage, and access can be revoked to users or downstream deployers in cases of misuse [105]. Within this centralized access point, automated filtering-based monitoring can be done on both inputs and outputs to ensure the model’s intended use is preserved [36]. This filtering can sometimes be supplemented by human oversight, depending on desired robustness levels.
2.3.2 Access & Security ControlsStructured access refers to methods which limit users’ or deployers’ direct access to a model’s parameters by constraining access to a model through a centralized access point (e.g., an API) [88]. This access point can be monitored for usage, and access can be revoked to users or downstream deployers in cases of misuse [105]. Within this centralized access point, automated filtering-based monitoring can be done on both inputs and outputs to ensure the model’s intended use is preserved [36]. This filtering can sometimes be supplemented by human oversight, depending on desired robustness levels.
AI systems can be developed and tested within a sandbox, (a secure and isolated environment used for separating running programs), such that outside access to information within the sandbox is restricted. Within this environment, resources such as storage and memory space, and network access, would be disallowed or heavily restricted [15]. With sandboxing, dangerous or harmful outputs generated during testing will be contained.
1.2.2 Runtime EnvironmentModel development
2.4 Engineering & DevelopmentModel development > Data-related
1.1 ModelModel evaluations
2.2.2 Testing & EvaluationModel evaluations > General evaluations
2.2.2 Testing & EvaluationModel evaluations > Benchmarking
3.2.1 Benchmarks & EvaluationModel evaluations > Red teaming
2.2.2 Testing & EvaluationRisk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems
Gipiškis, Rokas; San Joaquin, Ayrton; Chin, Ze Shen; Regenfuß, Adrian; Gil, Ariel; Holtman, Koen (2024)
Organizations and governments that develop, deploy, use, and govern AI must coordinate on effective risk mitigation. However, the landscape of AI risk mitigation frameworks is fragmented, uses inconsistent terminology, and has gaps in coverage. This paper introduces a preliminary AI Risk Mitigation Taxonomy to organize AI risk mitigations and provide a common frame of reference. The Taxonomy was developed through a rapid evidence scan of 13 AI risk mitigation frameworks published between 2023-2025, which were extracted into a living database of 831 distinct AI risk mitigations.
Other (multiple stages)
Applies across multiple lifecycle stages
Unable to classify
Could not be classified to a specific actor type
Unable to classify
Could not be classified to a specific AIRM function