Misuse tactics to compromise GenAI systems (Model integrity)
AI systems that memorize and leak sensitive personal data or infer private information about individuals without their consent. Unexpected or unauthorized sharing of data and information can compromise user expectation of privacy, assist identity theft, or cause loss of confidential intellectual property.
-
Sub-categories (7)
Prompt injection
"Prompt Injections are a form of Adversarial Input that involve manipulating the text instructions given to a GenAI system (Liu et al., 2023). Prompt Injections exploit loopholes in a model’s architec- tures that have no separation between system instructions and user data to produce a harmful output (Perez and Ribeiro, 2022). While researchers may use similar techniques to test the robustness of GenAI models, malicious actors can also leverage them. For example, they might flood a model with manipulative prompts to cause denial-of-service attacks or to bypass an AI detection software."
2.2 AI system security vulnerabilities and attacksAdversarial input
"Adversarial Inputs involve modifying individual input data to cause a model to malfunction. These modifications, which are often imperceptible to humans, exploit how the model makes decisions to produce errors (Wallace et al., 2019) and can be applied to text, but also to images, audio, or video (e.g. changing pixels in an image of a panda in a way that causes a model to label it as a gibbon).6"
2.2 AI system security vulnerabilities and attacksJailbreaking
"Jailbreaking aims to bypass or remove restrictions and safety filters placed on a GenAI model completely (Chao et al., 2023; Shen et al., 2023). This gives the actor free rein to generate any output, regardless of its content being harmful, biassed, or offensive. All three of these are tactics that manipulate the model into producing harmful outputs against its design. The difference is that prompt injections and adversarial inputs usually seek to steer the model towards producing harmful or incorrect outputs from one query, whereas jailbreaking seeks to dismantle a model’s safety mechanisms in their entirety."
2.2 AI system security vulnerabilities and attacksModel diversion
"Model Diversion takes model manipulation one step further, by repurposing (often open-source) generative AI models in a way that diverts them from their intended functionality or from the use cases envisioned by their developers (Lin et al., 2024). An example of this is training the BERT open source model on the DarkWeb to create DarkBert.7"
4.2 Cyberattacks, weapon development or use, and mass harmModel extraction
"Data Exfiltration goes beyond revealing private information, and involves illicitly obtaining the training data used to build a model that may be sensitive or proprietary. Model Extraction is the same attack, only directed at the model instead of the training data — it involves obtaining the architecture, parameters, or hyper-parameters of a proprietary model (Carlini et al., 2024)."
2.2 AI system security vulnerabilities and attacksSteganography
"Steganography is the practice of hiding coded messages in GenAI model outputs, which may allow malicious actors to communicate covertly.8"
2.2 AI system security vulnerabilities and attacksPoisoning
"Data Poisoning involves deliberately corrupting a model’s training dataset to introduce vulnerabilities, derail its learning process, or cause it to make incorrect predictions (Carlini et al., 2023). For example, the tool Nightshade is a data poisoning tool, which allows artists to add invisible changes to the pixels in their art before uploading online, to break any models that use it for training.9 Such attacks exploit the fact that most GenAI models are trained on publicly available datasets like images and videos scraped from the web, which malicious actors can easily compromise."
2.2 AI system security vulnerabilities and attacksOther risks from Marchal2024 (22)
Misuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness)
4.3 Fraud, scams, and targeted manipulationMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Impersonation
4.3 Fraud, scams, and targeted manipulationMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Appropriated Likeness
4.3 Fraud, scams, and targeted manipulationMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Sockpuppeting
4.1 Disinformation, surveillance, and influence at scaleMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Non-consensual intimate imagery (NCII)
4.3 Fraud, scams, and targeted manipulationMisuse tactics that exploit GenAI capabilities (Realistic depiction of human likeness) > Child sexual abuse material (CSAM)
4.3 Fraud, scams, and targeted manipulation