Loss of control

Sub-category

Risk Domain

7AI System Safety, Failures & Limitations

7.1AI pursuing its own goals in conflict with human goals or values

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

"‘Loss of control’ scenarios are hypothetical future scenarios in which one or more general- purpose AI systems come to operate outside of anyone’s control, with no clear path to regaining control. These scenarios vary in their severity, but some experts give credence to outcomes as severe as the marginalisation or extinction of humanity."(p. 100)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (2)

"Two key requirements for commonly discussed loss of control scenarios are a. markedly increased AI capabilities and b. the use of those capabilities in ways that undermine control. First, some future AI systems would need specific capabilities (significantly surpassing those of current systems) that allow them to undermine human control. Second, some AI systems would need to employ these 'control- undermining capabilities', either because they were intentionally designed to do so or because technical issues produce unintended behaviour."(p. 100)

"There are multiple versions of loss of control concerns, including versions that emphasise ‘passive’ loss of control (see Figure 2.5). In ‘passive’ loss of control scenarios, important decisions are delegated to AI systems, but the systems’ decisions are too opaque, complex, or fast to allow for or incentivise meaningful oversight. Alternatively, people stop exercising oversight because they strongly trust the systems’ decisions and are not required to exercise oversight (585, 589). These concerns are partly grounded in the ‘automation bias’ literature, which reports many cases of people complacently relying on recommendations from automated systems (590, 591)."(p. 101)