AI System bypassing a sandbox environment
AI systems that develop, access, or are provided with capabilities that increase their potential to cause mass harm through deception, weapons development and acquisition, persuasion and manipulation, political strategy, cyber-offense, AI development, situational awareness, and self-proliferation. These capabilities may cause mass harm due to malicious human actors, misaligned AI systems, or failure in the AI system.
"An AI system may have the ability to bypass a sandboxed environment in which it is trained or evaluated."(p. 42)
Supporting Evidence (1)
"For example, the AI system can achieve this by finding and using misconfigurations or vulnerabilities in the software of the sandboxed environment. This can also occur if the AI system finds and uses vulnerabilities of the hardware it is being run on, or by using social engineering techniques on the users or administrators of the sandboxed environment [74]. The developers or malicious actors may intentionally create such behavior (e.g., by inserting backdoors), or it can occur unintentionally, with the AI system bypassing the developer-intended domain of operation [1]."
Part of Cybersecurity
Other risks from Gipiškis2024 (144)
Direct Harm Domains (content safety harms)
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Violence and extremism
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Hate and toxicity
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Sexual content
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Child harm
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Self-harm
1.2 Exposure to toxic content