Skip to main content
Home/Risks/Anwar et al. (2024)/Foundationality May Cause Correlated Failures

Foundationality May Cause Correlated Failures

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Anwar et al. (2024)

Sub-category
Risk Domain

Risks from multi-agent interactions, due to incentives (which can lead to conflict or collusion) and/or the structure of multi-agent systems, which can create cascading failures, selection pressures, new security vulnerabilities, and a lack of shared information and trust.

"Another important characteristic of LLM development is foundationality — due to the expense of large- scale pretraining, many deployed instances share similar or identical learned components. Foundation- ality may both be a blessing and a curse. On the one hand, it may be possible to exploit the similarity in the design of LLM-agents to facilitate cooperation (Critch et al., 2022; Conitzer and Oesterheld, 2023; Oesterheld et al., 2023). On the other hand, foundationality may leave LLM-agents vulnerable to correlated failures both in terms of safety and capabilities due to increased output homogenization (Bommasani et al., 2022)."(p. 38)

Part of Vulnerability to Poisoning and Backdoors

Other risks from Anwar et al. (2024) (26)