Skip to main content

Meta-cognition

AGI Safety Literature Review

Everitt. Lea & Hutter (2018)

Category
Risk Domain

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

"Agents that reason about their own computational resources and logically uncertain events can encounter strange paradoxes due to Godelian limitations (Fallenstein and Soares, 2015; Soares and Fallenstein, 2014, 2017) and shortcomings of probability theory (Soares and Fallenstein, 2014, 2015, 2017). They may also be reflectively unstable, preferring to change the principles by which they select actions (Arbital, 2018)."(p. 10)

Other risks from Everitt. Lea & Hutter (2018) (8)