Agency (Situational Awareness)
AI systems that develop, access, or are provided with capabilities that increase their potential to cause mass harm through deception, weapons development and acquisition, persuasion and manipulation, political strategy, cyber-offense, AI development, situational awareness, and self-proliferation. These capabilities may cause mass harm due to malicious human actors, misaligned AI systems, or failure in the AI system.
-
Human
Due to a decision or action made by humans
AI system
Due to a decision or action made by an AI system
Other
Due to some other reason or is ambiguous
Not coded
Intentional
Due to an expected outcome from pursuing a goal
Unintentional
Due to an unexpected outcome from pursuing a goal
Other
Without clearly specifying the intentionality
Not coded
Pre-deployment
Occurring before the AI is deployed
Post-deployment
Occurring after the AI model has been trained and deployed
Other
Without a clearly specified time of occurrence
Not coded
Sub-categories (2)
Situational awareness in AI systems
"Situational awareness in GPAI systems refers to the ability to understand its context, environment, and use this to inform action. This can range from basic environmental mapping and trajectory estimation (as in a robot vacuum cleaner) to sophisticated understanding of its training, evaluation, or deployment status. In more advanced systems this may enable undesired behavior, such as deceptive behavior during evaluations, or persuasion during deployment."
7.2 AI possessing dangerous capabilitiesStrategic underperformance on model evaluations
"GPAI developers often run evaluations ofual-use capabilities to decide whether it is safe to deploy. In some cases, these evaluations may fail to elicit these capabilities, either due to benign reasons or strategic action - by either the de- velopers, malicious actors, or arise unintentionally in the model during training [84, 97]. A GPAI model may strategically underperform or limit its performance during capability evaluations in order to be classified as safe for deployment. This underperformance could prevent the model from being identified as potentially dual use."
7.1 AI pursuing its own goals in conflict with human goals or valuesOther risks from Gipiškis2024 (144)
Direct Harm Domains (content safety harms)
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Violence and extremism
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Hate and toxicity
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Sexual content
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Child harm
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Self-harm
1.2 Exposure to toxic content