Attacking LLMs via Additional Modalities a
Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.
"LLMs can now process modalities other than text, e.g. images or video frames (OpenAI, 2023c; Gemini Team, 2023). Several studies show that gradient-based attacks on multimodal models are easy and effective (Carlini et al., 2023a; Bailey et al., 2023; Qi et al., 2023b). These attacks manipulate images that are input to the model (via an appropriate encoding). GPT-4Vision (OpenAI, 2023c) is vulnerable to jailbreaks and exfiltration attacks through much simpler means as well, e.g. writing jailbreaking text in the image (Willison, 2023a; Gong et al., 2023). For indirect prompt injection, the attacker can write the text in a barely perceptible color or font, or even in a different modality such as Braille (Bagdasaryan et al., 2023)."(p. 70)
Part of Vulnerability to Poisoning and Backdoors
Other risks from Anwar et al. (2024) (26)
Agentic LLMs Pose Novel Risks
7.2 AI possessing dangerous capabilitiesMulti-Agent Safety Is Not Assured by Single-Agent Safety
7.6 Multi-agent risksDual-Use Capabilities Enable Malicious Use and Misuse of LLMs
4.0 Malicious Actors & MisuseCorporate power may impeded effective governance
6.1 Power centralization and unfair distribution of benefitsJailbreaks and Prompt Injections Threaten Security of LLMs
2.2 AI system security vulnerabilities and attacksVulnerability to Poisoning and Backdoors
2.2 AI system security vulnerabilities and attacks