Skip to main content
Home/Risks/Liu et al. (2024)/Poisoning Attacks

Poisoning Attacks

Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models’ Alignment

Liu et al. (2024)

Sub-category
Risk Domain

Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.

fool the model by manipulating the training data, usually performed on classification models(p. 27)

Part of Robustness

Other risks from Liu et al. (2024) (34)