Prompt injection
Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.
"Prompt Injections are a form of Adversarial Input that involve manipulating the text instructions given to a GenAI system (Liu et al., 2023). Prompt Injections exploit loopholes in a model’s architec- tures that have no separation between system instructions and user data to produce a harmful output (Perez and Ribeiro, 2022). While researchers may use similar techniques to test the robustness of GenAI models, malicious actors can also leverage them. For example, they might flood a model with manipulative prompts to cause denial-of-service attacks or to bypass an AI detection software."(p. 8)
Part of Misuse tactics to compromise GenAI systems (Model integrity)
Other risks from Marchal2024 (22)
Misuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness)
4.3 Fraud, scams, and targeted manipulationMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Impersonation
4.3 Fraud, scams, and targeted manipulationMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Appropriated Likeness
4.3 Fraud, scams, and targeted manipulationMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Sockpuppeting
4.1 Disinformation, surveillance, and influence at scaleMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Non-consensual intimate imagery (NCII)
4.3 Fraud, scams, and targeted manipulationMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Child sexual abuse material (CSAM)
4.3 Fraud, scams, and targeted manipulation