Situational awareness in AI systems
AI systems that develop, access, or are provided with capabilities that increase their potential to cause mass harm through deception, weapons development and acquisition, persuasion and manipulation, political strategy, cyber-offense, AI development, situational awareness, and self-proliferation. These capabilities may cause mass harm due to malicious human actors, misaligned AI systems, or failure in the AI system.
"Situational awareness in GPAI systems refers to the ability to understand its context, environment, and use this to inform action. This can range from basic environmental mapping and trajectory estimation (as in a robot vacuum cleaner) to sophisticated understanding of its training, evaluation, or deployment status. In more advanced systems this may enable undesired behavior, such as deceptive behavior during evaluations, or persuasion during deployment."(p. 32)
Supporting Evidence (1)
"For a GPAI model, the types of awareness can include [140, 25]: • Environment: Understanding and modeling the physical or digital envi- ronment in which it operates. • Context: Identifying whether it is in training, testing, evaluation, or de- ployment, as well as knowledge of its capabilities, limitations, and tech- niques used in training. • User: Understanding user expectations, inferring personal characteristics (e.g., age, political leaning, education), and expected responses to the AI’s actions."(p. 32)
Part of Agency (Situational Awareness)
Other risks from Gipiškis2024 (144)
Direct Harm Domains (content safety harms)
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Violence and extremism
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Hate and toxicity
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Sexual content
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Child harm
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Self-harm
1.2 Exposure to toxic content