Skip to main content
Home/Risks/Anwar et al. (2024)/Safety Risks from Affordances Provided to LLM-agents

Safety Risks from Affordances Provided to LLM-agents

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Anwar et al. (2024)

Sub-category
Risk Domain

AI systems that develop, access, or are provided with capabilities that increase their potential to cause mass harm through deception, weapons development and acquisition, persuasion and manipulation, political strategy, cyber-offense, AI development, situational awareness, and self-proliferation. These capabilities may cause mass harm due to malicious human actors, misaligned AI systems, or failure in the AI system.

"The capabilities of LLM-agents can be enhanced in significant ways by providing the LLM-agent with novel affordances, e.g. the ability to browse the web (Nakano et al., 2021), to manipulate objects in the physical world (Ahn et al., 2022; Huang et al., 2022a), to create and instruct copies of itself (Richards, 2023), to create and use new tools (Wang et al., 2023a), etc. Affordances can create additional risks, as they often increase the impact area of the language-agent, and they amplify the consequences of an agent’s failures and enable novel forms of failure modes (Ruan et al., 2023; Pan et al., 2024)."(p. 36)

Part of Vulnerability to Poisoning and Backdoors

Other risks from Anwar et al. (2024) (26)