BackJailbreak in LLM Malicious Use - Poisoning Training Data
Jailbreak in LLM Malicious Use - Poisoning Training Data
Risk Domain
Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.
"In the data collecting and pre-training phase, malicious adversaries can Jailbreak LLMs through poisoning their training data to make the model to output harmful content."(p. 21)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Other risks from Wang et al. (2025) (11)
Privacy - Membership Inference Attack (MIA)
2.2 AI system security vulnerabilities and attacksHumanIntentionalPost-deployment
Privacy - Data Extraction Attack (DEA)
2.2 AI system security vulnerabilities and attacksHumanIntentionalPost-deployment
Privacy - Prompt Inversion Attack (PIA)
2.2 AI system security vulnerabilities and attacksHumanIntentionalPost-deployment
Privacy - Attribute Inference Attack (AIA)
2.2 AI system security vulnerabilities and attacksHumanIntentionalPost-deployment
Privacy - Model Extraction Attack (MEA)
2.2 AI system security vulnerabilities and attacksHumanIntentionalPost-deployment
Hallucination
3.1 False or misleading informationAI systemUnintentionalPost-deployment