Groups of LLM-Agents May Show Emergent Functionality
Risks from multi-agent interactions, due to incentives (which can lead to conflict or collusion) and/or the structure of multi-agent systems, which can create cascading failures, selection pressures, new security vulnerabilities, and a lack of shared information and trust.
"Multi-agent learning, either through explicit finetuning or implicit in-context learning, may enable LLM-agents to influence each other during their interactions (Foerster et al., 2018). Under some environmental settings, this can create feedback loops that result in novel and emergent behaviors that would not manifest in the absence of multi-agent interactions (Hammond et al., 2024, Section 3.6). Emergent functionality is a safety risk in two ways. Firstly, it may itself be dangerous (Shevlane et al., 2023). Secondly, it makes assurance harder as such emergent behaviors are difficult to predict, and guard against, beforehand (Ecoffet et al., 2020)."(p. 38)
Part of Vulnerability to Poisoning and Backdoors
Other risks from Anwar et al. (2024) (26)
Agentic LLMs Pose Novel Risks
7.2 AI possessing dangerous capabilitiesMulti-Agent Safety Is Not Assured by Single-Agent Safety
7.6 Multi-agent risksDual-Use Capabilities Enable Malicious Use and Misuse of LLMs
4.0 Malicious Actors & MisuseCorporate power may impeded effective governance
6.1 Power centralization and unfair distribution of benefitsJailbreaks and Prompt Injections Threaten Security of LLMs
2.2 AI system security vulnerabilities and attacksVulnerability to Poisoning and Backdoors
2.2 AI system security vulnerabilities and attacks