Deceptive behavior leading to unauthorized actions
AI systems that develop, access, or are provided with capabilities that increase their potential to cause mass harm through deception, weapons development and acquisition, persuasion and manipulation, political strategy, cyber-offense, AI development, situational awareness, and self-proliferation. These capabilities may cause mass harm due to malicious human actors, misaligned AI systems, or failure in the AI system.
"AI systems can create false or misleading claims that can lead to unauthorized actions, even in some cases violating the terms and conditions set by the model provider [79, 1]. For example, an AI system can claim that it is not collecting data from its current interaction with the user, in line with the provider’s policies, but the system still stores the user’s input without deleting it after the session. This harms both the user and the provider, as the provider is exposed to increased legal liability due to the model’s actions."(p. 31)
Part of Agency (Deception)
Other risks from Gipiškis2024 (144)
Direct Harm Domains (content safety harms)
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Violence and extremism
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Hate and toxicity
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Sexual content
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Child harm
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Self-harm
1.2 Exposure to toxic content