Skip to main content
Home/Risks/Anwar et al. (2024)/Multi-Agent Safety Is Not Assured by Single-Agent Safety

Multi-Agent Safety Is Not Assured by Single-Agent Safety

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Anwar et al. (2024)

Category
Risk Domain

Risks from multi-agent interactions, due to incentives (which can lead to conflict or collusion) and/or the structure of multi-agent systems, which can create cascading failures, selection pressures, new security vulnerabilities, and a lack of shared information and trust.

"A foremost lesson of game theory is that optimal decision-making within a single-agent setting (i.e. selfishly optimizing for an agent’s own utility) can produce sub-optimal outcomes in the presence of other strategic agents. Failing to account for the strategic nature of other agents can cause an agent to adopt strategies under which potentially everyone, including the agent itself, ends up worse off (Schelling, 1981; Harsanyi, 1995; Roughgarden, 2005; Nisan, 2007). Examples include collective action problems (or ‘social dilemmas’) such as arms races or the depletion of common resources, as well as other kinds of market failures such as those caused by asymmetric information or negative externalities (Bator, 1958; Coase, 1960; Buchanan and Stubblebine, 1962; Kirzner, 1963; Dubey, 1986)."(p. 37)

Other risks from Anwar et al. (2024) (26)