Skip to main content
This is a research prototype. The data and analyses are preliminary and not yet validated — we'd welcome your .
BackFine-tuning related (Poisoning models during instruction tuning)

Fine-tuning related (Poisoning models during instruction tuning)

Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems

Gipiškis et al. (2024)

Sub-category
Risk Domain

Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.

"AI models can be poisoned during instruction tuning when models are tuned using pairs of instructions and desired outputs. Poisoning in instruction tuning can be achieved with a lower number of compromised samples, as instruction tuning requires a relatively small number of samples for fine-tuning [155, 211]. Anonymous crowdsourcing efforts may be employed in collecting instruction tuning datasets and can further contribute to poisoning attacks [187]. These attacks might be harder to detect than traditional data poisoning attacks."(p. 14)

Other risks from Gipiškis et al. (2024) (144)