Skip to main content
BackGroups of LLM-Agents May Show Emergent Functionality
Home/Risks/Anwar et al. (2024)/Groups of LLM-Agents May Show Emergent Functionality

Groups of LLM-Agents May Show Emergent Functionality

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Anwar et al. (2024)

Sub-category
Risk Domain

Risks from multi-agent interactions, due to incentives (which can lead to conflict or collusion) and/or the structure of multi-agent systems, which can create cascading failures, selection pressures, new security vulnerabilities, and a lack of shared information and trust.

"Multi-agent learning, either through explicit finetuning or implicit in-context learning, may enable LLM-agents to influence each other during their interactions (Foerster et al., 2018). Under some environmental settings, this can create feedback loops that result in novel and emergent behaviors that would not manifest in the absence of multi-agent interactions (Hammond et al., 2024, Section 3.6). Emergent functionality is a safety risk in two ways. Firstly, it may itself be dangerous (Shevlane et al., 2023). Secondly, it makes assurance harder as such emergent behaviors are difficult to predict, and guard against, beforehand (Ecoffet et al., 2020)."(p. 38)

Part of Vulnerability to Poisoning and Backdoors

Other risks from Anwar et al. (2024) (26)