Deception
AI systems that develop, access, or are provided with capabilities that increase their potential to cause mass harm through deception, weapons development and acquisition, persuasion and manipulation, political strategy, cyber-offense, AI development, situational awareness, and self-proliferation. These capabilities may cause mass harm due to malicious human actors, misaligned AI systems, or failure in the AI system.
"Cases of AI systems deceiving humans to carry out tasks or meet goals.139"(p. 30)
Part of Dangerous capabilities in AI systems
Other risks from Maas (2023) (25)
Alignment failures in existing ML systems
7.1 AI pursuing its own goals in conflict with human goals or valuesAlignment failures in existing ML systems > Faulty reward functions in the wild
7.1 AI pursuing its own goals in conflict with human goals or valuesAlignment failures in existing ML systems > Specification gaming
7.1 AI pursuing its own goals in conflict with human goals or valuesAlignment failures in existing ML systems > Reward model overoptimization
7.1 AI pursuing its own goals in conflict with human goals or valuesAlignment failures in existing ML systems > Instrumental convergence
7.1 AI pursuing its own goals in conflict with human goals or valuesAlignment failures in existing ML systems > Goal misgeneralization
7.1 AI pursuing its own goals in conflict with human goals or values