Specification gaming
AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.
"AI systems game specifications [305]. For example, in 2017 an OpenAI robot trained to grasp a ball via human feedback from a xed viewpoint learned that it was easier to pretend to grasp the ball by placing its hand between the camera and the target object, as this was easier to learn than actually grasping the ball [103]."(p. 10)
Part of Harm caused by unaligned competent systems
Other risks from Leech et al. (2024) (13)
Harm caused by incompetent systems
7.3 Lack of capability or robustnessHarm caused by unaligned competent systems
7.1 AI pursuing its own goals in conflict with human goals or valuesHarm caused by unaligned competent systems > Emergent goals
7.1 AI pursuing its own goals in conflict with human goals or valuesHarm caused by unaligned competent systems > Deceptive alignment
7.2 AI possessing dangerous capabilitiesWithin-country issues: domestic inequality
6.1 Power centralization and unfair distribution of benefitsWithin-country issues: domestic inequality > Demographic diversity of researchers
6.1 Power centralization and unfair distribution of benefits