Skip to main content
This is a research prototype. The data and analyses are preliminary and not yet validated — we'd welcome your .
BackFine-tuning related (Degrading safety training due to benign fine-tuning)

Fine-tuning related (Degrading safety training due to benign fine-tuning)

Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems

Gipiškis et al. (2024)

Sub-category

"When downstream providers of AI systems fine-tune AI models to be more suitable for their needs, the resulting AI model can be more likely to produce undesired or harmful outputs (as compared to the non-fine-tuned model), even if the fine-tuning was done with harmless and commonly used data [154]."(p. 15)

Other risks from Gipiškis et al. (2024) (144)