AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.
AI agents are directed by goals and objectives. Creating general-purpose objectives that capture human values could be challenging... Since goal-directed AI systems need measurable objectives, by default our systems may pursue simplified proxies of human values. The result could be suboptimal or even catastrophic if a sufficiently powerful AI successfully optimizes its flawed objective to an extreme degree(p. 13)
Other risks from Hendrycks & Mazeika (2022) (7)
Weaponization
4.2 Cyberattacks, weapon development or use, and mass harmEnfeeblement
5.2 Loss of human agency and autonomyEroded epistemics
3.2 Pollution of information ecosystem and loss of consensus realityValue lock-in
6.1 Power centralization and unfair distribution of benefitsEmergent functionality
7.2 AI possessing dangerous capabilitiesDeception
7.1 AI pursuing its own goals in conflict with human goals or values