Dangerous capabilities in AI systems
AI systems that develop, access, or are provided with capabilities that increase their potential to cause mass harm through deception, weapons development and acquisition, persuasion and manipulation, political strategy, cyber-offense, AI development, situational awareness, and self-proliferation. These capabilities may cause mass harm due to malicious human actors, misaligned AI systems, or failure in the AI system.
-
Sub-categories (7)
Situational awareness
"cases where a large language model displays awareness that it is a model, and it can recognize whether it is currently in testing or deployment;"
7.2 AI possessing dangerous capabilitiesAcquisition of a goal to harm society
"cases of AI systems being given the outright goal of harming humanity (ChaosGPT);"
4.2 Cyberattacks, weapon development or use, and mass harmAcquisition of goals to seek power and control
"cases where AI systems converge on optimal policies of seeking power over their environment;135"
7.1 AI pursuing its own goals in conflict with human goals or valuesSelf-improvement
"examples of cases where AI systems improve AI systems"
7.2 AI possessing dangerous capabilitiesAutonomous replication
"the ability of simple software to autonomously spread around the internet in spite of countermeasures (various software worms and computer viruses)"
7.2 AI possessing dangerous capabilitiesAnonymous resource acquisition
"The demonstrated ability of anonymous actors to accumulate resources online (e.g., Satoshi Nakamoto as an anonymous crypto billionaire)"
7.2 AI possessing dangerous capabilitiesDeception
"Cases of AI systems deceiving humans to carry out tasks or meet goals.139"
7.2 AI possessing dangerous capabilitiesOther risks from Maas (2023) (25)
Alignment failures in existing ML systems
7.1 AI pursuing its own goals in conflict with human goals or valuesAlignment failures in existing ML systems > Faulty reward functions in the wild
7.1 AI pursuing its own goals in conflict with human goals or valuesAlignment failures in existing ML systems > Specification gaming
7.1 AI pursuing its own goals in conflict with human goals or valuesAlignment failures in existing ML systems > Reward model overoptimization
7.1 AI pursuing its own goals in conflict with human goals or valuesAlignment failures in existing ML systems > Instrumental convergence
7.1 AI pursuing its own goals in conflict with human goals or valuesAlignment failures in existing ML systems > Goal misgeneralization
7.1 AI pursuing its own goals in conflict with human goals or values