Risks from AIs developing goals and valu…

BackRisks from delegating decision-making power to misaligned AIs

Home/Risks/Clarke2023/Risks from delegating decision-making power to misaligned AIs

Risks from AIs developing goals and valu…

Home/Risks/Clarke2023/Risks from delegating decision-making power to misaligned AIs

Risks from AIs developing goals and valu…

Risks from delegating decision-making power to misaligned AIs

A Survey of the Potential Long-term Impacts of AI: How AI Could Lead to Long-term Changes in Science, Cooperation, Power, Epistemics and Values

Sub-category

Risk Domain

7AI System Safety, Failures & Limitations

7.1AI pursuing its own goals in conflict with human goals or values

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

"As AI systems become more advanced a nd begin to take over more important decision-making in the world, an AI system pursuing a different objective from what was intended could have much more worrying consequences."(p. 8)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (2)

"In one scenario, described by Christiano [11], we gradually use AI to automate more and more decision-making across different sectors (e.g., law enforcement, business strategy, legislation), because AI systems become able to make better and faster decisions than humans in those sectors. There will be competitive pressures to automate decisions, because actors who decide not to do so will fall behind on their objectives and be outcompeted. Regulatory capture by powerful technology companies will also contribute to increasing automation—for example, companies might engage in political donations or lobbying to water down regulation intended to slow down automation."(p. 8)

"To see how this scenario could turn catastrophic, let’s take the example of AI systems automating law enforcement. Suppose these systems that have been successfully trained to minimise reported crime rate. Initially, law enforcement would probably seem to be improving. Since we’re assuming that automated decision- making is better and faster than human-decision making, reported crime will in fact fall. We will be increasingly depending on automated law enforcement—and investing less in training humans to do the relevant jobs—such that any suggestions to reverse the delegation of decision-making power to AI systems would be met with reasonable concern that we just cannot afford to. However, reported crime rate is not the same as the true prevalence of crime. As AI systems become more sophisticated, they will continue to drive down reported crime by hiding information about law enforcement failures, supressing complaints, and manipulating citizens.16 As the gap between how things are and how they appear grows, so too will the deceptive abilities of our automated decision-making systems. Eventually, they will be able to manipulate our perception of the world in sophisticated ways (e.g. highly persuasive media or education), and they may explicitly oppose any attempts to shut them down or modify their objectives—because human attempts to take back influence will result in reported crime rising again, which is precisely what they have been trained to prevent."(p. 8)

Part of AI leads to humans losing control of the future