BackGradual, irretrievable ceding of human power over the future to AI systems
Gradual, irretrievable ceding of human power over the future to AI systems
Risk Domain
Delegating by humans of key decisions to AI systems, or AI systems that make decisions that diminish human control and autonomy, potentially leading to humans feeling disempowered, losing the ability to shape a fulfilling life trajectory, or becoming cognitively enfeebled.
-(p. 32)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Other risks from Maas (2023) (25)
Alignment failures in existing ML systems
7.1 AI pursuing its own goals in conflict with human goals or valuesAI systemUnintentionalOther
Alignment failures in existing ML systems > Faulty reward functions in the wild
7.1 AI pursuing its own goals in conflict with human goals or valuesHumanUnintentionalPost-deployment
Alignment failures in existing ML systems > Specification gaming
7.1 AI pursuing its own goals in conflict with human goals or valuesAI systemIntentionalOther
Alignment failures in existing ML systems > Reward model overoptimization
7.1 AI pursuing its own goals in conflict with human goals or valuesAI systemIntentionalOther
Alignment failures in existing ML systems > Instrumental convergence
7.1 AI pursuing its own goals in conflict with human goals or valuesAI systemIntentionalPost-deployment
Alignment failures in existing ML systems > Goal misgeneralization
7.1 AI pursuing its own goals in conflict with human goals or valuesAI systemUnintentionalOther