Skip to main content
Home/Risks/Anwar et al. (2024)/Jailbreaks and Prompt Injections Threaten Security of LLMs

Jailbreaks and Prompt Injections Threaten Security of LLMs

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Anwar et al. (2024)

Category
Risk Domain

Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.

"LLMs are not adversarially robust and are vulnerable to security failures such as jailbreaks and prompt-injection attacks. While a number of jailbreak attacks have been proposed in the literature, the lack of standardized evaluation makes it difficult to compare them. We also do not have efficient white-box methods to evaluate adver- sarial robustness. Multi-modal LLMs may further allow novel types of jailbreaks via additional modalities. Finally, the lack of robust privilege levels within the LLM input means that jailbreaking and prompt-injection attacks may be particularly hard to eliminate altogether."(p. 46)

Other risks from Anwar et al. (2024) (26)