BackSafe learning
Category
Risk Domain
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
"AGIs should avoid making fatal mistakes during the learning phase. Subproblems include safe exploration and distributional shift (DeepMind, OpenAI), and continual learning (Berkeley)."(p. 9)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Other risks from Everitt. Lea & Hutter (2018) (8)
Value specification
7.1 AI pursuing its own goals in conflict with human goals or valuesHumanOtherPost-deployment
Reliability
7.1 AI pursuing its own goals in conflict with human goals or valuesHumanOtherPost-deployment
Corrigibility
7.1 AI pursuing its own goals in conflict with human goals or valuesOtherUnintentionalOther
Security
2.2 AI system security vulnerabilities and attacksHumanUnintentionalPre-deployment
Intelligibility
7.4 Lack of transparency or interpretabilityHumanUnintentionalPre-deployment
Subagents
7.2 AI possessing dangerous capabilitiesAI systemIntentionalPost-deployment