Jailbreaks and Prompt Injections Threaten Security of LLMs
Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.
"LLMs are not adversarially robust and are vulnerable to security failures such as jailbreaks and prompt-injection attacks. While a number of jailbreak attacks have been proposed in the literature, the lack of standardized evaluation makes it difficult to compare them. We also do not have efficient white-box methods to evaluate adver- sarial robustness. Multi-modal LLMs may further allow novel types of jailbreaks via additional modalities. Finally, the lack of robust privilege levels within the LLM input means that jailbreaking and prompt-injection attacks may be particularly hard to eliminate altogether."(p. 46)
Other risks from Anwar et al. (2024) (26)
Agentic LLMs Pose Novel Risks
7.2 AI possessing dangerous capabilitiesMulti-Agent Safety Is Not Assured by Single-Agent Safety
7.6 Multi-agent risksDual-Use Capabilities Enable Malicious Use and Misuse of LLMs
4.0 Malicious Actors & MisuseCorporate power may impeded effective governance
6.1 Power centralization and unfair distribution of benefitsVulnerability to Poisoning and Backdoors
2.2 AI system security vulnerabilities and attacksVulnerability to Poisoning and Backdoors > Natural Language Underspecifies Goals
7.1 AI pursuing its own goals in conflict with human goals or values