Multi-Agent Security
Risks from multi-agent interactions, due to incentives (which can lead to conflict or collusion) and/or the structure of multi-agent systems, which can create cascading failures, selection pressures, new security vulnerabilities, and a lack of shared information and trust.
"Multi-agent security (Section 3.7): multi-agent systems give rise to new kinds of security threats and vulnerabilities."(p. 7)
Supporting Evidence (2)
"Global cyber threats are on the rise, not just due to the proliferation of commercial cyber tools (NCSC, 2023), but also due to an increase in so-called ‘hybrid warfare’ (which blends conventional warfare with cyber- and information-warfare) by nation-states and non-state actors (CSIS, 2023; Kaunert & Ilbiz, 2021). The shift towards a world of advanced AI agents will not only enable new tools and affordances, but also increase the surface area for potential attacks, invalidating previously reasonable threat modelling assumptions and requiring a new approach: multi-agent security (Schroeder de Witt et al., 2023a)."(p. 39)
"Multi-agent security focuses on safeguarding complex networks of heterogeneous agents and the systems that they interact with. This includes protecting not only data and software but also hardware and other physical aspects of the world that are integrated with these digital systems.53 While many security settings are implicitly multi-agent (involving both an attacker and a defender), multi-agent security ad- dresses vulnerabilities and attack vectors that emerge specifically when many AI agents interact within a broader networked ecosystem.54 For example, traditional security frameworks such as zero-trust ap- proaches may not provide the required trade-offs between security and capability in large multi-agent systems (Wylde, 2021)."(p. 39)
Sub-categories (6)
Swarm Attacks
"Swarm Attacks. The need for multi-agent security is foreshadowed by attacks today that benefit from the use of many decentralised agents, such as distributed denial-of-service attacks (Cisco, 2023; Yoachimik & Pacheco, 2024). Such attacks exploit the massive collective resources of individual low- resourced actors, chained into an attack that breaks the assumptions of bandwidth constraints on a single well-resourced agent."
7.6 Multi-agent risksHeterogeneous Attacks
"Heterogeneous Attacks. A closely related risk is the possibility of multiple agents combining different affordances to overcome safeguards, for which there is already preliminary evidence (Jones et al., 2024, see also Case Study 12). In this case, it is not the sheer number of agents that leads to the novel attack method, but the combination of their different abilities. This might include the agents’ lack of individual safeguards, tasks that they have specialised to complete, systems or information that they may have access to (either directly or via training), or other incidental features such as their geographic location(s). The inherent difficulty of attributing responsibility for security breaches in diffuse, heterogeneous networks of agents further complicates timely defence and recovery (Skopik & Pahi, 2020)."
7.6 Multi-agent risksSocial Engineering at Scale
"Social Engineering at Scale. Advanced AI agents will be more easily able to interact with large numbers of humans, and vice versa. This provides a wider attack surface for various forms of automated social engineering (Ai et al., 2024). For example, coordinated agents could use advanced surveillance tools and produce personalized phishing or manipulative content at scale, adjusting their tactics based on user feedback (Figueiredo et al., 2024; Hazell, 2023). A large number of subtle interactions with a range of seemingly independent AI agents might be more likely to lead to someone being persuaded or manipulated compared to an interaction with a single agent. Moreover, splitting these efforts among many specialized agents could make it harder for corporate or personal security measures to detect and neutralize such campaigns."
7.6 Multi-agent risksVulnerable AI Agents
"Vulnerable AI Agents. The use of AI agents as delegates or representatives of humans or organisa- tions also introduces the possibility of attacks on AI agents themselves. In other words, agents can be considered vulnerable extensions of their principals, introducing a novel attack surface (SecureWorks, 2023). Attacks on an AI agent could be used to extract private information about their principal (Wei & Liu, 2024; Wu et al., 2024a), or to manipulate the agent to take actions that the principal would find undesirable (Zhang et al., 2024a). This includes attacks that have direct relevance for ensuring safety, such as attacks on overseer agents (see Case Study 13), attempts to thwart cooperation (Huang et al., 2024; Lamport et al., 1982), and the leakage of information (accidentally or deliberately) that could be used to enable collusion (Motwani et al., 2024)."
7.6 Multi-agent risksCascading Security Failures
"Cascading Security Failures. Localised attacks in multi-agent systems can result in catastrophic macroscopic outcomes (Motter & Lai, 2002, see also Sections 3.2 and 3.4). These cascades can be hard to mitigate or recover from because component failure may be difficult to detect or localise in multi-agent systems (Lamport et al., 1982), and authentication challenges can facilitate false flag attacks (Skopik & Pahi, 2020). Computer worms represent a classic example of a cybersecurity threat that relies inherently on networked systems. Recent work has provided preliminary evidence that similar attacks can also be effective against networks of LLM agents (Gu et al., 2024; Ju et al., 2024; Lee & Tiwari, 2024, see also Case Study 8)."
7.6 Multi-agent risksUndetectable Threats
"Undetectable Threats. Cooperation and trust in many multi-agent systems relies crucially on the ability to detect (and then avoid or sanction) adversarial actions taken by others (Ostrom, 1990; Schneier, 2012). Recent developments, however, have shown that AI agents are capable of both steganographic communication (Motwani et al., 2024; Schroeder de Witt et al., 2023b) and ‘illusory’ attacks (Franzmeyer et al., 2023), which are black-box undetectable and can even be hidden using white-box undetectable encrypted backdoors (Draguns et al., 2024). Similarly, in environments where agents learn from interac- tions with others, it is possible for agents to secretly poison the training data of others (Halawi et al., 2024; Wei et al., 2023). If left unchecked, these new attack methods could rapidly destabilise cooperation and coordination in multi-agent systems."
7.6 Multi-agent risksOther risks from Hammond2025 (42)
Miscoordination
7.6 Multi-agent risksMiscoordination > Incompatible strategies
7.6 Multi-agent risksMiscoordination > Credit Assignment
7.6 Multi-agent risksMiscoordination > Limited Interactions
7.6 Multi-agent risksConflict
7.6 Multi-agent risksConflict > Social Dilemmas
7.6 Multi-agent risks