Skip to main content
Home/Risks/Leech et al. (2024)/Specification gaming

Specification gaming

Ten Hard Problems in Artificial Intelligence We Must Get Right

Leech et al. (2024)

Sub-category
Risk Domain

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

"AI systems game specifications [305]. For example, in 2017 an OpenAI robot trained to grasp a ball via human feedback from a xed viewpoint learned that it was easier to pretend to grasp the ball by placing its hand between the camera and the target object, as this was easier to learn than actually grasping the ball [103]."(p. 10)

Part of Harm caused by unaligned competent systems

Other risks from Leech et al. (2024) (13)