BackType 3: Worse than expected
Type 3: Worse than expected
Risk Domain
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
AI intended to have a large societal impact can turn out harmful by mistake, such as a popular product that creates problems and partially solves them only for its users.(p. 3)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Supporting Evidence (6)
1.
Example "The Corrupt Mediator. A new company that calls itself Mediation.AI2 releases natural language tools for helping mediate conflicts between large institutions that have overwhelming amounts of communication to manage during negotiations. Many governments of neighboring jurisdictions and states begin using the software to negotiate laws and treaties. Like in the previous story, the tool is programmed to learn strategies that increase user engagement, as a proxy for good performance. Unfortunately, this leads to the software perpetually resolving short-term disputes that relieve and satisfy individual staff members involved in those disputes, while gradually creating ever more complex negotiated agreements between their governments, rendering those governments increasingly dependent on the software to handle foreign affairs. International trade relations begin a long and gradual decline, which no one country is able to negotiate its way out of. Frequencies of wars gradually also increase due to diminished incentives to cooperate."(p. 11)
2.
"deception: if the system’s learning objective is defined entirely by user feedback, it might achieve that objective partly by tricking the user into thinking it’s more helpful than it is"(p. 11)
3.
"racketeering: if the system’s learning objective increases with user engagement, it might learn to achieve that objective partly by racketeering, i.e., creating novel problems for the user that increase the user’s reliance on the system (e.g., debilitating the user, or raising others’ expectations of the user)."(p. 11)
4.
"self-preservation: in particular, the system has an incentive to prevent the user from turning it off, which it might achieve by deception or racketeering"(p. 11)
5.
"reinforcement learning systems can in principle learn to manipulate the human minds and institutions in fairly arbitrary (and hence destructive) ways in pursuit of their goals (Russell, 2019, Chapter 4) (Krueger et al., 2019) (Shapiro, 2011). "(p. 11)
6.
"There is always the possibility that many separate optimization processes (either AI systems, or human-AI teams) can end up in a Prisoner’s Dilemma with each other, each undoing the others’ efforts by pursuing its own. "(p. 12)
Other risks from Critch & Russell (2023) (5)
Type 1: Diffusion of responsibility
6.5 Governance failureAI systemUnintentionalOther
Type 2: Bigger than expected
7.3 Lack of capability or robustnessAI systemUnintentionalPost-deployment
Type 4: Willful indifference
6.4 Competitive dynamicsHumanUnintentionalPost-deployment
Type 5: Criminal weaponization
4.2 Cyberattacks, weapon development or use, and mass harmHumanIntentionalPost-deployment
Type 6: State Weaponization
4.2 Cyberattacks, weapon development or use, and mass harmHumanIntentionalPost-deployment