BackMeta-cognition
Category
Risk Domain
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
"Agents that reason about their own computational resources and logically uncertain events can encounter strange paradoxes due to Godelian limitations (Fallenstein and Soares, 2015; Soares and Fallenstein, 2014, 2017) and shortcomings of probability theory (Soares and Fallenstein, 2014, 2015, 2017). They may also be reflectively unstable, preferring to change the principles by which they select actions (Arbital, 2018)."(p. 10)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Other risks from Everitt. Lea & Hutter (2018) (8)
Value specification
7.1 AI pursuing its own goals in conflict with human goals or valuesHumanOtherPost-deployment
Reliability
7.1 AI pursuing its own goals in conflict with human goals or valuesHumanOtherPost-deployment
Corrigibility
7.1 AI pursuing its own goals in conflict with human goals or valuesOtherUnintentionalOther
Security
2.2 AI system security vulnerabilities and attacksHumanUnintentionalPre-deployment
Safe learning
7.3 Lack of capability or robustnessAI systemUnintentionalPre-deployment
Intelligibility
7.4 Lack of transparency or interpretabilityHumanUnintentionalPre-deployment