AI systems that develop, access, or are provided with capabilities that increase their potential to cause mass harm through deception, weapons development and acquisition, persuasion and manipulation, political strategy, cyber-offense, AI development, situational awareness, and self-proliferation. These capabilities may cause mass harm due to malicious human actors, misaligned AI systems, or failure in the AI system.
"Future AI systems may gain access to websites and engage in real-world actions, potentially yielding a more substantial impact on the world (Nakano et al., 2021). They may disseminate false information, deceive users, disrupt network security, and, in more dire scenarios, be compromised by malicious actors for ill purposes. Moreover, their increased access to data and resources can facilitate self-proliferation, posing existential risks (Shevlane et al., 2023)."(p. 7)
Part of Double edge components
Other risks from Ji et al. (2023) (16)
Causes of Misalignment
7.1 AI pursuing its own goals in conflict with human goals or valuesCauses of Misalignment > Reward Hacking
7.1 AI pursuing its own goals in conflict with human goals or valuesCauses of Misalignment > Goal Misgeneralization
7.1 AI pursuing its own goals in conflict with human goals or valuesCauses of Misalignment > Reward Tampering
7.1 AI pursuing its own goals in conflict with human goals or valuesCauses of Misalignment > Limitations of Human Feedback
7.0 AI System Safety, Failures & LimitationsCauses of Misalignment > Limitations of Reward Modeling
7.1 AI pursuing its own goals in conflict with human goals or values