Undesirable Capabilities
Risks from multi-agent interactions, due to incentives (which can lead to conflict or collusion) and/or the structure of multi-agent systems, which can create cascading failures, selection pressures, new security vulnerabilities, and a lack of shared information and trust.
"Undesirable Capabilities. As agents interact, they iteratively exploit each other’s weaknesses, forc- ing them to address these weaknesses and gain new capabilities. This co-adaptation between agents can quickly lead to emergent self-supervised autocurricula (where agents create their own challenges, driving open-ended skill acquisition through interaction), generating agents with ever-more sophisticated strate- gies in order to out-compete each other (Leibo et al., 2019). This effect is so powerful that harnessing it has been critical to the success of superhuman systems, such as the use of self-play in algorithms like AlphaGo (Silver et al., 2016). However, as AI systems are released into the wild, it becomes possible for this effect to run rampant, producing agents with greater and greater capabilities for ends we do not understand"(p. 28)
Supporting Evidence (1)
"For example, Baker et al. (2019) showed that even a simple game of hide and seek can lead to sophisticated tool use and coordination by MARL agents. In another case, researchers observed the emergence of manipulative communication, where an agent in an mixed-motive setting learns to use a shared communication channel to manipulate others (Blumenkamp & Prorok, 2021). Worse, this emer- gent complexity from co-adaptation could be open-ended and thus fundamentally unpredictable (Hughes et al., 2024)."(p. 29)
Part of Selection Pressures
Other risks from Hammond2025 (42)
Miscoordination
7.6 Multi-agent risksMiscoordination > Incompatible strategies
7.6 Multi-agent risksMiscoordination > Credit Assignment
7.6 Multi-agent risksMiscoordination > Limited Interactions
7.6 Multi-agent risksConflict
7.6 Multi-agent risksConflict > Social Dilemmas
7.6 Multi-agent risks