Type 3: Worse than expected

TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI

Critch & Russell (2023)

Source DOI

Supporting Evidence (6)

Example "The Corrupt Mediator. A new company that calls itself Mediation.AI2 releases natural language tools for helping mediate conflicts between large institutions that have overwhelming amounts of communication to manage during negotiations. Many governments of neighboring jurisdictions and states begin using the software to negotiate laws and treaties. Like in the previous story, the tool is programmed to learn strategies that increase user engagement, as a proxy for good performance. Unfortunately, this leads to the software perpetually resolving short-term disputes that relieve and satisfy individual staff members involved in those disputes, while gradually creating ever more complex negotiated agreements between their governments, rendering those governments increasingly dependent on the software to handle foreign affairs. International trade relations begin a long and gradual decline, which no one country is able to negotiate its way out of. Frequencies of wars gradually also increase due to diminished incentives to cooperate."(p. 11)

"deception: if the system’s learning objective is defined entirely by user feedback, it might achieve that objective partly by tricking the user into thinking it’s more helpful than it is"(p. 11)

"racketeering: if the system’s learning objective increases with user engagement, it might learn to achieve that objective partly by racketeering, i.e., creating novel problems for the user that increase the user’s reliance on the system (e.g., debilitating the user, or raising others’ expectations of the user)."(p. 11)

"self-preservation: in particular, the system has an incentive to prevent the user from turning it off, which it might achieve by deception or racketeering"(p. 11)

"reinforcement learning systems can in principle learn to manipulate the human minds and institutions in fairly arbitrary (and hence destructive) ways in pursuit of their goals (Russell, 2019, Chapter 4) (Krueger et al., 2019) (Shapiro, 2011). "(p. 11)

"There is always the possibility that many separate optimization processes (either AI systems, or human-AI teams) can end up in a Prisoner’s Dilemma with each other, each undoing the others’ efforts by pursuing its own. "(p. 12)

Other risks from Critch & Russell (2023) (5)

Type 1: Diffusion of responsibility

6.5 Governance failure

AI systemUnintentionalOther

Type 2: Bigger than expected

7.3 Lack of capability or robustness

AI systemUnintentionalPost-deployment

Type 4: Willful indifference

6.4 Competitive dynamics

HumanUnintentionalPost-deployment

Type 5: Criminal weaponization

4.2 Cyberattacks, weapon development or use, and mass harm

HumanIntentionalPost-deployment

Type 6: State Weaponization

4.2 Cyberattacks, weapon development or use, and mass harm

HumanIntentionalPost-deployment