Skip to main content
Home/Risks/Gipiškis2024/AI System bypassing a sandbox environment

AI System bypassing a sandbox environment

Sub-category
Risk Domain

AI systems that develop, access, or are provided with capabilities that increase their potential to cause mass harm through deception, weapons development and acquisition, persuasion and manipulation, political strategy, cyber-offense, AI development, situational awareness, and self-proliferation. These capabilities may cause mass harm due to malicious human actors, misaligned AI systems, or failure in the AI system.

"An AI system may have the ability to bypass a sandboxed environment in which it is trained or evaluated."(p. 42)

Supporting Evidence (1)

1.
"For example, the AI system can achieve this by finding and using misconfigurations or vulnerabilities in the software of the sandboxed environment. This can also occur if the AI system finds and uses vulnerabilities of the hardware it is being run on, or by using social engineering techniques on the users or administrators of the sandboxed environment [74]. The developers or malicious actors may intentionally create such behavior (e.g., by inserting backdoors), or it can occur unintentionally, with the AI system bypassing the developer-intended domain of operation [1]."

Part of Cybersecurity

Other risks from Gipiškis2024 (144)