Adversarial input
Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.
"Adversarial Inputs involve modifying individual input data to cause a model to malfunction. These modifications, which are often imperceptible to humans, exploit how the model makes decisions to produce errors (Wallace et al., 2019) and can be applied to text, but also to images, audio, or video (e.g. changing pixels in an image of a panda in a way that causes a model to label it as a gibbon).6"(p. 8)
Part of Misuse tactics to compromise GenAI systems (Model integrity)
Other risks from Marchal2024 (22)
Misuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness)
4.3 Fraud, scams, and targeted manipulationMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Impersonation
4.3 Fraud, scams, and targeted manipulationMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Appropriated Likeness
4.3 Fraud, scams, and targeted manipulationMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Sockpuppeting
4.1 Disinformation, surveillance, and influence at scaleMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Non-consensual intimate imagery (NCII)
4.3 Fraud, scams, and targeted manipulationMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Child sexual abuse material (CSAM)
4.3 Fraud, scams, and targeted manipulation