AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.
"One way we might lose control of an AI agent’s actions is if it engages in behavior known as “proxy gaming.” It is often difficult to specify and measure the exact goal that we want a system to pursue. Instead, we give the system an approximate—“proxy”—goal that is more measurable and seems likely to correlate with the intended goal. However, AI systems often find loopholes by which they can easily achieve the proxy goal, but completely fail to achieve the ideal goal. If an AI “games” its proxy goal in a way that does not reflect our values, then we might not be able to reliably steer its behavior."(p. 35)
Supporting Evidence (2)
"Correctly specifying goals is no trivial task. If delineating exactly what we want from a boat racing AI is tricky, capturing the nuances of human values under all possible scenarios will be much harder. Philosophers have been attempting to precisely describe morality and human values for millennia, so a precise and flawless characterization is not within reach. Although we can refine the goals we give AIs, we might always rely on proxies that are easily definable and measurable. Discrepancies between the proxy goal and the intended function arise for many reasons. Besides the difficulty of exhaustively specifying everything we care about, there are also limits to how much we can oversee AIs, in terms of time, computational resources, and the number of aspects of a system that can be monitored. Additionally, AIs may not be adaptive to new circumstances or robust to adversarial attacks that seek to misdirect them. As long as we give AIs proxy goals, there is the chance that they will find loopholes we have not thought of, and thus find unexpected solutions that fail to pursue the ideal goal."(p. 36)
"The more intelligent an AI is, the better it will be at gaming proxy goals. Increasingly intelligent agents can be increasingly capable of finding unanticipated routes to optimizing proxy goals without achieving the desired outcome [118]. Additionally, as we grant AIs more power to take actions in society, for example by using them to automate certain processes, they will have access to more means of achieving their goals. They may then do this in the most efficient way available to them, potentially causing harm in the process. In a worst case scenario, we can imagine a highly powerful agent optimizing a flawed objective to an extreme degree without regard for human life. This represents a catastrophic risk of proxy gaming"(p. 36)
Part of Rogue AIs (Internal)
Other risks from Hendrycks, Mazzeika & Woodside (2023) (13)
Malicious Use (Intentional)
4.0 Malicious Actors & MisuseMalicious Use (Intentional) > Bioterrorism
4.2 Cyberattacks, weapon development or use, and mass harmMalicious Use (Intentional) > Unleashing AI Agents
4.2 Cyberattacks, weapon development or use, and mass harmMalicious Use (Intentional) > Persuasive AIs
4.1 Disinformation, surveillance, and influence at scaleMalicious Use (Intentional) > Concentration of Power
6.1 Power centralization and unfair distribution of benefitsAI Race (Environmental/Structural)
6.4 Competitive dynamics