BackAI objectives mis-aligned with human intentions
AI objectives mis-aligned with human intentions
Risk Domain
AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.
"AI models and systems might develop goals that diverge from human intentions."(p. 11)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Other risks from Uuk2025 (60)
Control
7.1 AI pursuing its own goals in conflict with human goals or valuesAI systemIntentionalPost-deployment
Democracy
6.0 Socioeconomic & EnvironmentalOtherOtherOther
Discrimination
1.1 Unfair discrimination and misrepresentationOtherOtherPost-deployment
Economy
6.2 Increased inequality and decline in employment qualityOtherOtherPost-deployment
Environment
6.6 Environmental harmAI systemUnintentionalPost-deployment
Governance
6.5 Governance failureAI systemUnintentionalPost-deployment