Coercion and Extortion
Risks from multi-agent interactions, due to incentives (which can lead to conflict or collusion) and/or the structure of multi-agent systems, which can create cascading failures, selection pressures, new security vulnerabilities, and a lack of shared information and trust.
"Advanced AI systems might also lead to various forms of coercion and extortion in less extreme settings (Ellsberg, 1968; Harrenstein et al., 2007). These threats might target humans directly (such as the revelation of private information extracted by advanced AI surveillance tools), or other AI systems that are deployed on behalf of humans (such as by hacking a system to limit its resources or operational capacity; see also Section 3.7). Increasing AI cyber-offensive capabilities – including those that target other AI systems via adversarial attacks and jailbreaking (Gleave et al., 2020; Yamin et al., 2021; Zou et al., 2023) – without a commensurate increase in defensive capabilities could make this form of conflict cheaper, more widespread, and perhaps also harder to detect (Brundage et al., 2018). Addressing these issues requires design strategies that prevent AI systems from exploiting, or being susceptible to, such coercive tactics."(p. 14)
Part of Conflict
Other risks from Hammond2025 (42)
Miscoordination
7.6 Multi-agent risksMiscoordination > Incompatible strategies
7.6 Multi-agent risksMiscoordination > Credit Assignment
7.6 Multi-agent risksMiscoordination > Limited Interactions
7.6 Multi-agent risksConflict
7.6 Multi-agent risksConflict > Social Dilemmas
7.6 Multi-agent risks