Skip to main content
Home/Risks/Critch & Russell (2023)/Type 3: Worse than expected

Type 3: Worse than expected

TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI

Critch & Russell (2023)

Category
Risk Domain

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

AI intended to have a large societal impact can turn out harmful by mistake, such as a popular product that creates problems and partially solves them only for its users.(p. 3)

Supporting Evidence (6)

1.
Example "The Corrupt Mediator. A new company that calls itself Mediation.AI2 releases natural language tools for helping mediate conflicts between large institutions that have overwhelming amounts of communication to manage during negotiations. Many governments of neighboring jurisdictions and states begin using the software to negotiate laws and treaties. Like in the previous story, the tool is programmed to learn strategies that increase user engagement, as a proxy for good performance. Unfortunately, this leads to the software perpetually resolving short-term disputes that relieve and satisfy individual staff members involved in those disputes, while gradually creating ever more complex negotiated agreements between their governments, rendering those governments increasingly dependent on the software to handle foreign affairs. International trade relations begin a long and gradual decline, which no one country is able to negotiate its way out of. Frequencies of wars gradually also increase due to diminished incentives to cooperate."(p. 11)
2.
"deception: if the system’s learning objective is defined entirely by user feedback, it might achieve that objective partly by tricking the user into thinking it’s more helpful than it is"(p. 11)
3.
"racketeering: if the system’s learning objective increases with user engagement, it might learn to achieve that objective partly by racketeering, i.e., creating novel problems for the user that increase the user’s reliance on the system (e.g., debilitating the user, or raising others’ expectations of the user)."(p. 11)
4.
"self-preservation: in particular, the system has an incentive to prevent the user from turning it off, which it might achieve by deception or racketeering"(p. 11)
5.
"reinforcement learning systems can in principle learn to manipulate the human minds and institutions in fairly arbitrary (and hence destructive) ways in pursuit of their goals (Russell, 2019, Chapter 4) (Krueger et al., 2019) (Shapiro, 2011). "(p. 11)
6.
"There is always the possibility that many separate optimization processes (either AI systems, or human-AI teams) can end up in a Prisoner’s Dilemma with each other, each undoing the others’ efforts by pursuing its own. "(p. 12)

Other risks from Critch & Russell (2023) (5)