Skip to main content
BackPower-seeking behavior
Home/Risks/Hendrycks & Mazeika (2022)/Power-seeking behavior

Power-seeking behavior

X-Risk Analysis for AI Research

Hendrycks & Mazeika (2022)

Category
Risk Domain

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

Agents that have more power are better able to accomplish their goals. Therefore, it has been shown that agents have incentives to acquire and maintain power. AIs that acquire substantial power can become especially dangerous if they are not aligned with human values(p. 14)

Other risks from Hendrycks & Mazeika (2022) (7)