AI leads to humans losing control of the future

A Survey of the Potential Long-term Impacts of AI: How AI Could Lead to Long-term Changes in Science, Cooperation, Power, Epistemics and Values

Category

Risk Domain

7AI System Safety, Failures & Limitations

7.1AI pursuing its own goals in conflict with human goals or values

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

"The values that steer humanity’s future: humanity gaining more control over the future due to developments in AI, or losing our potential for gaining control, both seem possible. Much will depend on our ability to solve the alignment problem, who develops powerful AI first, and what they use it for. These long-term impacts of AI could be hugely important but are currently under-explored. We’ve attempted to structure some of the discussion and stimulate more research, by reviewing existing arguments and highlighting open questions. While there are many ways AI could in theory enable a flourishing future for humanity, trends of AI development and deployment in practice leave us concerned about long-lasting harms. We would particularly encourage future work that critically explores ways AI could have positive long-term impacts in more depth, such as by enabling greater cooperation or problem-solving around global challenges."(p. 9)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (1)

"The obvious question is: why would we develop advanced AI systems that are willing and able to take control of the future? One major concern is that we don't yet have ways of designing AI systems that reliably do what their designers want. Instead, modern AI training14 works by (roughly speaking) tweaking a system's “parameters” many times, until it scores highly according to some given “training objective”, evaluated on some “training data”. For instance, the large language model GPT-3 [7] is trained by (roughly speaking) tweaking its parameters until it scores highly at “predicting the next word” on “text scraped from the internet”. However, this approach gives no guarantee that a system will continue to pursue the training objective as intended over the long run. Indeed, notice that there are many objectives a system could learn that will lead it to score highly on the training objective but which do not lead to desirable behaviour over the long run."(p. 8)

Sub-categories (2)

Risks from AIs developing goals and values that are different from humans

"The main concern here is that we might develop advanced AI systems whose goals and values are different from those of humans, and are capable enough to take control of the future away from humanity."

7.1 AI pursuing its own goals in conflict with human goals or values

AI systemIntentionalOther

Risks from delegating decision-making power to misaligned AIs

"As AI systems become more advanced a nd begin to take over more important decision-making in the world, an AI system pursuing a different objective from what was intended could have much more worrying consequences."

7.1 AI pursuing its own goals in conflict with human goals or values

AI systemIntentionalOther