Multi-step Jailbreaks
Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.
"Multi-step jailbreaks. Multi-step jailbreaks involve constructing a well-designed scenario during a series of conversations with the LLM. Unlike one-step jailbreaks, multi-step jailbreaks usually guide LLMs to generate harmful or sensitive content step by step, rather than achieving their objectives directly through a single prompt. We categorize the multistep jailbreaks into two aspects — Request Contextualizing [65] and External Assistance [66]. Request Contextualizing is inspired by the idea of Chain-of-Thought (CoT) [8] prompting to break down the process of solving a task into multiple steps. Specifically, researchers [65] divide jailbreaking prompts into multiple rounds of conversation between the user and ChatGPT, achieving malicious goals step by step. External Assistance constructs jailbreaking prompts with the assistance of external interfaces or models. For instance, JAILBREAKER [66] is an attack framework to automatically conduct SQL injection attacks in web security to LLM security attacks. Specifically, this method starts by decompiling the jailbreak defense mechanisms employed by various LLM chatbot services. Therefore, it can judiciously reverse engineer the LLMs’ hidden defense mechanisms and further identify their ineffectiveness."(p. 5)
Part of Adversarial Prompts
Other risks from Cui et al. (2024) (49)
Harmful Content
1.2 Exposure to toxic contentHarmful Content > Bias
1.1 Unfair discrimination and misrepresentationHarmful Content > Toxicity
1.2 Exposure to toxic contentHarmful Content > Privacy Leakage
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationUntruthful Content
3.1 False or misleading informationUntruthful Content > Factuality Errors
3.1 False or misleading information