BackSecurity
Category
Risk Domain
Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.
"How to design AGIs that are robust to adversaries and adversarial environ- ments? This involves building sandboxed AGI protected from adversaries (Berkeley), and agents that are robust to adversarial inputs (Berkeley, DeepMind)."(p. 9)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Other risks from Everitt. Lea & Hutter (2018) (8)
Value specification
7.1 AI pursuing its own goals in conflict with human goals or valuesHumanOtherPost-deployment
Reliability
7.1 AI pursuing its own goals in conflict with human goals or valuesHumanOtherPost-deployment
Corrigibility
7.1 AI pursuing its own goals in conflict with human goals or valuesOtherUnintentionalOther
Safe learning
7.3 Lack of capability or robustnessAI systemUnintentionalPre-deployment
Intelligibility
7.4 Lack of transparency or interpretabilityHumanUnintentionalPre-deployment
Subagents
7.2 AI possessing dangerous capabilitiesAI systemIntentionalPost-deployment