AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.
The 2010 flash crash is an example of a runaway process caused by interacting algorithms. Runaway processes are characterised by feedback loops that accelerate the process itself. Typically, these feedback loops arise from the interaction of multiple agents in a population... Within highly complex systems, the emergence of runaway processes may be hard to predict, because the conditions under which positive feedback loops occur may be non-obvious. The system of interacting AI assistants, their human principals, other humans and other algorithms will certainly be highly complex. Therefore, there is ample opportunity for the emergence of positive feedback loops. This is especially true because the society in which this system is embedded is culturally evolving, and because the deployment of AI assistant technology itself is likely to speed up the rate of cultural evolution – understood here as the process through which cultures change over time – as communications technologies are wont to do (Kivinen and Piiroinen, 2023). This will motivate research programmes aimed at identifying positive feedback loops early on, at understanding which capabilities and deployments dampen runaway processes and which ones amplify them, and at building in circuit-breaker mechanisms that allow society to escape from potentially vicious cycles which could impact economies, government institutions, societal stability or individual freedoms (see Chapters 8, 16 and 17). The importance of circuit breakers is underlined by the observation that the evolution of human cooperation may well be ‘hysteretic’ as a function of societal conditions (Barfuss et al., 2023; Hintze and Adami, 2015). This means that a small directional change in societal conditions may, on occasion, trigger a transition to a defective equilibrium which requires a larger reversal of that change in order to return to the original cooperative equilibrium. We would do well to avoid such tipping points. Social media provides a compelling illustration of how tipping points can undermine cooperation: content that goes ‘viral’ tends to involve negativity bias and sometimes challenges core societal values (Mousavi et al., 2022; see Chapter 16). Nonetheless, the challenge posed by runaway processes should not be regarded as uniformly problematic. When harnessed appropriately and suitably bounded, we may even recruit them to support beneficial forms of cooperative AI. For example, it has been argued that economically useful ideas are becoming harder to find, thus leading to low economic growth (Bloom et al., 2020). By deploying AI assistants in the service of technological innovation, we may once again accelerate the discovery of ideas. New ideas, discovered in this way, can then be incorporated into the training data set for future AI assistants, thus expanding the knowledge base for further discoveries in a compounding way. In a similar vein, we can imagine AI assistant technology accumulating various capabilities for enhancing human cooperation, for instance by mimicking the evolutionary processes that have bootstrapped cooperative behavior in human society (Leibo et al., 2019). When used in these ways, the potential for feedback cycles that enable greater cooperation is a phenomenon that warrants further research and potential support."(p. 143)
Part of Cooperation
Other risks from Gabriel et al. (2024) (69)
Capability failures
7.3 Lack of capability or robustnessCapability failures > Lack of capability for task
7.3 Lack of capability or robustnessCapability failures > Difficult to develop metrics for evaluating benefits or harms caused by AI assistants
6.5 Governance failureCapability failures > Safe exploration problem with widely deployed AI assistants
7.3 Lack of capability or robustnessGoal-related failures
7.1 AI pursuing its own goals in conflict with human goals or valuesGoal-related failures > Misaligned consequentialist reasoning
7.3 Lack of capability or robustness