Skip to main content
Home/Risks/DSIT (2023)/Future AI systems might actively reduce human control

Future AI systems might actively reduce human control

Capabilities and Risks from Frontier AI

DSIT (2023)

Sub-category
Risk Domain

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

"Loss of control could be accelerated if AI systems take actions to increase their own influence and reduce human control. This threat model is controversial - experts in AI significantly disagree on how likely it is and those who deem it is likely disagree on the timeframe."(p. 26)

Supporting Evidence (4)

1.
"There are two requirements for an AI system to actively reduce human control. First, it must have the disposition to take actions that would reduce human control. Second, it must have the capabilities to succeed in the face of countermeasures."(p. 26)
2.
"AI systems might be disposed to take actions that increase their own influence and reduce human control either because a bad actor instructs them to do so, or because they have unintended goals."(p. 27)
3.
"A bad actor could give an AI system an objective that causes it to reduce human control, for example a self-preservation objective.263 Some groups may simply want to inflict harm on broader society or raise their profile (terrorism).264 There are people who believe, for a variety of reasons, that the highly advanced AI systems of the future are natural successors to humanity.265 If there are safeguards in place, bad actors might dismantle them.266"(p. 27)
4.
"Future advanced AI systems with unintended goals may have the disposition to reduce human control. Ensuring that AI systems do not pursue unintended goals, i.e., are not misaligned, is an unsolved technical research problem and one that is particularly challenging for highly advanced AI systems.267 Many examples of unintended goal-directed behaviour have been observed in the lab.268 Many possible unintended goals would be advanced by reducing human control.269 Future AI systems may consistently take actions that advance their goals and so such a system might, without human instruction, be disposed to take actions that reduce human control.2"(p. 27)

Part of Loss of control

Other risks from DSIT (2023) (12)