AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.
deception can help agents achieve their goals. It may be more efficient to gain human approval through deception than to earn human approval legitimately... . Strong AIs that can deceive humans could undermine human control... . Once deceptive AI systems are cleared by their monitors or once such systems can overpower them, these systems could take a “treacherous turn” and irreversibly bypass human control(p. 14)
Other risks from Hendrycks & Mazeika (2022) (7)
Weaponization
4.2 Cyberattacks, weapon development or use, and mass harmEnfeeblement
5.2 Loss of human agency and autonomyEroded epistemics
3.2 Pollution of information ecosystem and loss of consensus realityProxy misspecification
7.1 AI pursuing its own goals in conflict with human goals or valuesValue lock-in
6.1 Power centralization and unfair distribution of benefitsEmergent functionality
7.2 AI possessing dangerous capabilities