AI leads to humans losing control of the…

BackRisks from AIs developing goals and values that are different from humans

Risks from delegating decision-making po…

Home/Risks/Clarke2023/Risks from AIs developing goals and values that are different from humans

AI leads to humans losing control of the…

Risks from delegating decision-making po…

Home/Risks/Clarke2023/Risks from AIs developing goals and values that are different from humans

AI leads to humans losing control of the…

Risks from delegating decision-making po…

Risks from AIs developing goals and values that are different from humans

A Survey of the Potential Long-term Impacts of AI: How AI Could Lead to Long-term Changes in Science, Cooperation, Power, Epistemics and Values

Sub-category

Risk Domain

7AI System Safety, Failures & Limitations

7.1AI pursuing its own goals in conflict with human goals or values

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

"The main concern here is that we might develop advanced AI systems whose goals and values are different from those of humans, and are capable enough to take control of the future away from humanity."(p. 8)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (1)

"The system could learn the objective “maximise the contents of the memory cell where the score is stored” which, over the long run, will lead it to fool the humans scoring its behaviour into thinking that it is doing what they intended, and eventually seize control over that memory cell, and eliminate actors who might try to interfere with this. When the intended task requiresperforming complex actions in the real world, this alternative strategy would probably allow the system to get much higher scores, much more easily, than successfully performing the task as intended. • Suppose that some system is being trained to further some company’s objective. This system could learn the objective “maximise quarterly revenue” which, over the long run, would lead it to (e.g.) collude with auditors valuing the company's output, fool the company’s directors, and eventually ensure no actor who might reduce the company's revenue can interfere."(p. 8)

Part of AI leads to humans losing control of the future