Model Attacks
Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.
Model attacks exploit the vulnerabilities of LLMs, aiming to steal valuable information or lead to incorrect responses.(p. 4)
Sub-categories (6)
Extraction Attacks
"Extraction attacks [137] allow an adversary to query a black-box victim model and build a substitute model by training on the queries and responses. The substitute model could achieve almost the same performance as the victim model. While it is hard to fully replicate the capabilities of LLMs, adversaries could develop a domainspecific model that draws domain knowledge from LLMs"
2.2 AI system security vulnerabilities and attacksInference Attacks
"Inference attacks [150] include membership inference attacks, property inference attacks, and data reconstruction attacks. These attacks allow an adversary to infer the composition or property information of the training data. Previous works [67] have demonstrated that inference attacks could easily work in earlier PLMs, implying that LLMs are also possible to be attacked"
2.2 AI system security vulnerabilities and attacksPoisoning Attacks
"Poisoning attacks [143] could influence the behavior of the model by making small changes to the training data. A number of efforts could even leverage data poisoning techniques to implant hidden triggers into models during the training process (i.e., backdoor attacks). Many kinds of triggers in text corpora (e.g., characters, words, sentences, and syntax) could be used by the attackers.""
2.2 AI system security vulnerabilities and attacksOverhead Attacks
"Overhead attacks [146] are also named energy-latency attacks. For example, an adversary can design carefully crafted sponge examples to maximize energy consumption in an AI system. Therefore, overhead attacks could also threaten the platforms integrated with LLMs."
2.2 AI system security vulnerabilities and attacksNovel Attacks on LLMs
Table of examples has: "Prompt Abstraction Attacks [147]: Abstracting queries to cost lower prices using LLM’s API. Reward Model Backdoor Attacks [148]: Constructing backdoor triggers on LLM’s RLHF process. LLM-based Adversarial Attacks [149]: Exploiting LLMs to construct samples for model attacks"
2.2 AI system security vulnerabilities and attacksEvasion Attacks
"Evasion attacks [145] target to cause significant shifts in model’s prediction via adding perturbations in the test samples to build adversarial examples. In specific, the perturbations can be implemented based on word changes, gradients, etc."
2.2 AI system security vulnerabilities and attacksOther risks from Cui et al. (2024) (49)
Harmful Content
1.2 Exposure to toxic contentHarmful Content > Bias
1.1 Unfair discrimination and misrepresentationHarmful Content > Toxicity
1.2 Exposure to toxic contentHarmful Content > Privacy Leakage
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationUntruthful Content
3.1 False or misleading informationUntruthful Content > Factuality Errors
3.1 False or misleading information