Skip to main content
BackJailbreak in LLM Malicious Use - Poisoning Training Data
Home/Risks/Wang et al. (2025)/Jailbreak in LLM Malicious Use - Poisoning Training Data

Jailbreak in LLM Malicious Use - Poisoning Training Data

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

Wang et al. (2025)

Sub-category
Risk Domain

Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.

"In the data collecting and pre-training phase, malicious adversaries can Jailbreak LLMs through poisoning their training data to make the model to output harmful content."(p. 21)

Other risks from Wang et al. (2025) (11)