BackLoss of control

Loss of control

International Scientific Report on the Safety of Advanced AI

Bengio et al. (2024)

Sub-category

Risk Domain

7AI System Safety, Failures & Limitations

7.1AI pursuing its own goals in conflict with human goals or values

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

"'Loss of control’ scenarios are potential future scenarios in which society can no longer meaningfully constrain some advanced general- purpose AI agents, even if it becomes clear they are causing harm. These scenarios are hypothesised to arise through a combination of social and technical factors, such as pressures to delegate decisions to general- purpose AI systems, and limitations of existing techniques used to influence the behaviours of general- purpose AI systems."(p. 51)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (6)

"AI companies and researchers are increasingly interested in developing general- purpose AI ‘agents’ (sometimes also referred to as ‘autonomous general- purpose AI systems’). General- purpose AI agents are systems that can autonomously interact with the world, plan ahead, and pursue goals. Although general- purpose AI agents are beginning to be developed, they still demonstrate only very limited capabilities (26, 178, 480*). Various researchers and AI labs ultimately hope to create general- purpose AI agents that can operate and accomplish long- term tasks with little or no human oversight or intervention. Autonomous general- purpose AI systems, if fully realised, could be useful in many sectors. However, some researchers worry about risks from their malicious use, or from accidents and unintended consequences of their deployment (481*, 482)."(p. 51)

"Some researchers have also expressed concern about society’s ability to exercise reliable oversight and control over autonomous general- purpose AI systems. For decades, concerns about a potential loss of control have been raised by computer scientists looking ahead toward these kinds of AI systems, including AI pioneers such as Alan Turing (483), I. J. Good (484), and Norbert Wiener (485). These concerns have gained more prominence recently (486), partly because a subset of researchers now believe that sufficiently advanced general- purpose AI agents could be developed sooner than previously thought (127, 487, 488)."(p. 51)

"An AI system is considered 'controllable' when its behaviours can be meaningfully determined or constrained by humans. While a lack of control is not intrinsically harmful, it significantly increases the risks of various harms. Current general- purpose AI systems are generally considered to be controllable, but, if autonomous general- purpose AI systems are fully developed, then risk of the loss of control may grow considerably."(p. 52)

"Some mathematical findings suggest that future general- purpose AI agents may use strategies that hinder human control, but as yet it is unclear how well these findings will apply to real- world general- purpose AI systems. Some mathematical models of idealised goal- directed AI agents have found that, with sufficiently advanced planning capabilities, many such AI agents would hinder human attempts to interfere with their goal pursuit (493, 494, 495*, 496). Similar mathematical findings suggest that many such AI agents could have a tendency to 'seek power' by accumulating resources, interfering with oversight processes, and avoiding being deactivated, because these actions help them achieve their given goals (493, 494, 495*, 497*, 498, 499)."(p. 52)

"If people entrust general - purpose AI systems with increasingly critical responsibilities, then this could increase the risk of loss of control. A range of social and economic forces would influence the interaction between human and autonomous agents in such scenarios. For example, economic pressures may favour general- purpose AI- enabled automation in the absence of intervention, despite potentially negative consequences (502), and human over- reliance on general- purpose AI agents would make it harder to exercise oversight (481*). Using general- purpose AI agents to automate decision- making in government, military, or judicial applications might elevate concerns over AI’s influence on important societal decisions (503, 504, 505, 506). As a more extreme case, some actors have stated an interest in purposefully developing uncontrolled AI agents (507)."(p. 53)

"Certain specific capabilities could disproportionately increase the risk of loss of control. These capabilities – which are currently limited – include identifying and exploiting software vulnerabilities, persuasion, automating AI research and development, and capabilities needed to autonomously replicate and adapt (367*, 508*, 509). The relevant sections in this report discuss how capable current general- purpose AI systems are in some of these areas (4.1.3 Cyber offence, 4.1.2 Disinformation and manipulation of public opinion, 4.1.4 Dual use science risks). Particularly relevant are agent capabilities, which increase the ability for general- purpose AI systems to operate autonomously, such as planning and using memory. These are discussed in 4.4.1 Cross- cutting technical risk factors."(p. 53)

Part of Risks from Malfunctions