BackConflict

Conflict

Category

Risk Domain

7AI System Safety, Failures & Limitations

Risks from multi-agent interactions, due to incentives (which can lead to conflict or collusion) and/or the structure of multi-agent systems, which can create cascading failures, selection pressures, new security vulnerabilities, and a lack of shared information and trust.

"In the vast majority of real-world strategic interactions, agents’ objectives are neither identical nor completely opposed. Indeed, if AI agents are sufficiently aligned to their users or deployers, we should expect some degree of both cooperation and competition, mirroring human society. These mixed-motive settings include the possibility of mutual gains, but also the risk of conflict due to selfish incentives. In what follows, we examine the extent to which advanced AI might precipitate or exacerbate such risks."(p. 13)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (2)

"In this work, we use the word conflict in a relatively broad sense to refer to any outcome in a mixed- motive setting that does not lie on the Pareto frontier.8 This includes classic examples of conflict such as legal disputes and warfare, but also encompasses cooperation failures in collective action problems, such as the depletion of a common natural resource or a race to the bottom on legislation (Dawes & Messick, 2000; Snyder, 1971)."(p. 13)

"As we noted above, virtually all real-world strategic interactions of interest are mixed-motive, and as such the potential for conflict (even if in low-stakes scenarios) abounds. The introduction of advanced AI agents could both worsen existing risks of conflict (such as increasing the degree of competition in common-resource problems, or escalating military tensions) as well as well as introducing new forms of conflict (such as via sophisticated methods of coercion and extortion)."(p. 13)

Sub-categories (3)

Social Dilemmas

"Social Dilemmas. As noted in our definition, conflict can arise in any situation in which selfish incentives diverge from the collective good, known as a social dilemma (Dawes & Messick, 2000; Hardin, 1968; Kollock, 1998; Ostrom, 1990). While this is by no means a modern problem, advances in AI could further enable actors to pursue their selfish incentives by overcoming the technical, legal, or social barriers that standardly help to prevent this. To take a plausible, near-term (if very low-stakes) example, an automated AI assistant could easily reserve a table at every restaurant in town in minutes, enabling the user to decide later and cancel all other reservations"

7.6 Multi-agent risks

AI systemIntentionalPost-deployment

Military Domains

"Perhaps the most obvious and worrying instances of AI conflict are those in which human conflict is already a major concern, such as military domains (although other, less salient forms of conflict such as international trade wars are also cause for concern). For example, beyond applications of more narrow AI tools in lethal autonomous weapons systems (Horowitz, 2021), future AI systems might serve as advisors or negotiators in high-stakes military decisions (Black et al., 2024; Manson, 2024). Indeed, companies such as Palantir have already developed LLM-powered tools for military planning (Palantir, 2025), and the US Department of Defence has recently been evaluating models for such capacities, with personnel revealing that they “could be deployed by the military in the very near term” (Manson, 2023). The use of AI in command and control systems to gather and synthesise information – or recommend and even autonomously make decisions – could lead to rapid unintended escalation if these systems are not robust or are otherwise more conflict-prone (Johnson, 2021a; Johnson, 2020; Laird, 2020, see also Case Study 10).10"

7.6 Multi-agent risks

AI systemOtherPost-deployment

Coercion and Extortion

"Advanced AI systems might also lead to various forms of coercion and extortion in less extreme settings (Ellsberg, 1968; Harrenstein et al., 2007). These threats might target humans directly (such as the revelation of private information extracted by advanced AI surveillance tools), or other AI systems that are deployed on behalf of humans (such as by hacking a system to limit its resources or operational capacity; see also Section 3.7). Increasing AI cyber-offensive capabilities – including those that target other AI systems via adversarial attacks and jailbreaking (Gleave et al., 2020; Yamin et al., 2021; Zou et al., 2023) – without a commensurate increase in defensive capabilities could make this form of conflict cheaper, more widespread, and perhaps also harder to detect (Brundage et al., 2018). Addressing these issues requires design strategies that prevent AI systems from exploiting, or being susceptible to, such coercive tactics."

7.6 Multi-agent risks

AI systemOtherOther