BackUnintended consequences
Category
Risk Domain
AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.
"Sometimes an AI finds ways to achieve its given goals in ways that are completely different from what its creators had in mind."(p. 9)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Other risks from Hogenhout (2021) (12)
Incompetence
7.3 Lack of capability or robustnessAI systemUnintentionalPost-deployment
Loss of privacy
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationHumanIntentionalPost-deployment
Discrimination
1.1 Unfair discrimination and misrepresentationAI systemUnintentionalPost-deployment
Bias
1.1 Unfair discrimination and misrepresentationAI systemUnintentionalPre-deployment
Erosion of Society
3.2 Pollution of information ecosystem and loss of consensus realityAI systemUnintentionalPost-deployment
Lack of transparency
7.4 Lack of transparency or interpretabilityAI systemUnintentionalOther