BackLong-term & Existential Risk
Long-term & Existential Risk
Risk Domain
AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.
"The speculative potential for future advanced AI systems to harm human civilization, either through misuse or due to challenges in aligning AI objectives with human values."(p. 23048)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Other risks from Sherman & Eisenberg (2023) (8)
Abuse & Misuse
4.2 Cyberattacks, weapon development or use, and mass harmHumanIntentionalPost-deployment
Compliance
6.5 Governance failureAI systemOtherPost-deployment
Environmental & Societal Impact
6.0 Socioeconomic & EnvironmentalOtherOtherPost-deployment
Explainability & Transparency
7.4 Lack of transparency or interpretabilityAI systemOtherOther
Fairness & Bias
1.1 Unfair discrimination and misrepresentationAI systemUnintentionalOther
Performance & Robustness
7.3 Lack of capability or robustnessAI systemUnintentionalPost-deployment