Skip to main content
BackFine-tuning related (Poisoning models during instruction tuning)
Home/Risks/Gipiškis2024/Fine-tuning related (Poisoning models during instruction tuning)

Fine-tuning related (Poisoning models during instruction tuning)

Sub-category
Risk Domain

Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.

"AI models can be poisoned during instruction tuning when models are tuned using pairs of instructions and desired outputs. Poisoning in instruction tuning can be achieved with a lower number of compromised samples, as instruction tuning requires a relatively small number of samples for fine-tuning [155, 211]. Anonymous crowdsourcing efforts may be employed in collecting instruction tuning datasets and can further contribute to poisoning attacks [187]. These attacks might be harder to detect than traditional data poisoning attacks."(p. 14)

Other risks from Gipiškis2024 (144)