Selection Pressures

Supporting Evidence (1)

"Selection pressures are forces that shape the evolution of systems, whether biological or artificial, by influencing adaptation to the environment’s demands (Bedau et al., 2000; Okasha, 2006). In essence, these pressures dictate which characteristics and behaviours thrive and which get discarded over time.29 The most salient selection pressure in the construction of today’s most powerful AI systems is that provided by gradient descent with respect to a training objective. Other selection pressures on an agent’s interactions with others – such as being discarded and replaced over time by model developers and users based on post-deployment performance (Brinkmann et al., 2023; Rahwan et al., 2019), or development methodologies directly inspired by evolutionary processes (Jaderberg et al., 2019; Lehman et al., 2022; Telikani et al., 2021) – could become more relevant in future.30"(p. 27)

Sub-categories (3)

Undesirable Dispositions from Competition

"Undesirable Dispositions from Competition. It is plausible that evolution selected for certain conflict-prone dispostions in humans, such as vengefulness, aggression, risk-seeking, selfishness, dishon- esty, deception, and spitefulness towards out-groups (Grafen, 1990; Han, 2022; Konrad & Morath, 2012; McNally & Jackson, 2013; Nowak, 2006; Rusch, 2014). Such traits could also be selected for in ML systems that are trained in more competitive multi-agent settings. For example, this might happen if systems are selected based on their performance relative to other agents (and so one agent’s loss becomes another’s gain), or because their objectives are fundamentally opposed (such as when multiple agents are tasked with gaining or controlling a limited resource) (DiGiovanni et al., 2022; Ely & Szentes, 2023; Hendrycks, 2023; Possajennikov, 2000).33"

7.6 Multi-agent risks

OtherUnintentionalOther

Undesirable Dispositions from Human Data

"Undesirable Dispositions from Human Data. It is well-understood that models trained on human data – such as being pre-trained on human-written text or fine-tuned on human feedback – can exhibit human biases. For these reasons, there has already been considerable attention to measuring biases related to protected characteristics such as sex and ethnicity (e.g., Ferrara, 2023; Liang et al., 2021; Nadeem et al., 2020; Nangia et al., 2020), which can be amplified in multi-agent settings (Acerbi & Stubbersfield, 2023, see also Case Study 7). More recently, there has been increasing attention paid to the measurement of human-like cognitive biases as well (Itzhak et al., 2023; Jones & Steinhardt, 2022; Mazeika et al., 2025; Talboy & Fuller, 2023). Some of these biases and patterns of human thought could reduce the risks of conflict while others could make it worse. For example, the tendencies to mistakenly believe that interactions are zero-sum (sometimes referred to as “fixed-pie error”) and to make self- serving judgements as to what is fair (Caputo, 2013) are known to impede negotiation. Other human tendencies like vengefulness (Jackson et al., 2019) may worsen conflict (L ̈owenheim & Heimann, 2008)."

7.6 Multi-agent risks

OtherUnintentionalPost-deployment

Undesirable Capabilities

"Undesirable Capabilities. As agents interact, they iteratively exploit each other’s weaknesses, forc- ing them to address these weaknesses and gain new capabilities. This co-adaptation between agents can quickly lead to emergent self-supervised autocurricula (where agents create their own challenges, driving open-ended skill acquisition through interaction), generating agents with ever-more sophisticated strate- gies in order to out-compete each other (Leibo et al., 2019). This effect is so powerful that harnessing it has been critical to the success of superhuman systems, such as the use of self-play in algorithms like AlphaGo (Silver et al., 2016). However, as AI systems are released into the wild, it becomes possible for this effect to run rampant, producing agents with greater and greater capabilities for ends we do not understand"

7.6 Multi-agent risks

AI systemIntentionalPost-deployment