Skip to main content
Home/Risks/Hendrycks & Mazeika (2022)/Proxy misspecification

Proxy misspecification

X-Risk Analysis for AI Research

Hendrycks & Mazeika (2022)

Category
Risk Domain

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

AI agents are directed by goals and objectives. Creating general-purpose objectives that capture human values could be challenging... Since goal-directed AI systems need measurable objectives, by default our systems may pursue simplified proxies of human values. The result could be suboptimal or even catastrophic if a sufficiently powerful AI successfully optimizes its flawed objective to an extreme degree(p. 13)

Other risks from Hendrycks & Mazeika (2022) (7)