BackIntelligibility
Category
Risk Domain
Challenges in understanding or explaining the decision-making processes of AI systems, which can lead to mistrust, difficulty in enforcing compliance standards or holding relevant actors accountable for harms, and the inability to identify and correct errors.
"How can we build agent’s whose decisions we can understand? Con- nects explainable decisions (Berkeley) and informed oversight (MIRI)."(p. 9)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Other risks from Everitt. Lea & Hutter (2018) (8)
Value specification
7.1 AI pursuing its own goals in conflict with human goals or valuesHumanOtherPost-deployment
Reliability
7.1 AI pursuing its own goals in conflict with human goals or valuesHumanOtherPost-deployment
Corrigibility
7.1 AI pursuing its own goals in conflict with human goals or valuesOtherUnintentionalOther
Security
2.2 AI system security vulnerabilities and attacksHumanUnintentionalPre-deployment
Safe learning
7.3 Lack of capability or robustnessAI systemUnintentionalPre-deployment
Subagents
7.2 AI possessing dangerous capabilitiesAI systemIntentionalPost-deployment