BackFine-tuning related (Degrading safety training due to benign fine-tuning)
Fine-tuning related (Degrading safety training due to benign fine-tuning)
"When downstream providers of AI systems fine-tune AI models to be more suitable for their needs, the resulting AI model can be more likely to produce undesired or harmful outputs (as compared to the non-fine-tuned model), even if the fine-tuning was done with harmless and commonly used data [154]."(p. 15)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Other risks from Gipiškis2024 (144)
Direct Harm Domains (content safety harms)
1.2 Exposure to toxic contentNot codedNot codedNot coded
Direct Harm Domains (content safety harms) > Violence and extremism
1.2 Exposure to toxic contentNot codedNot codedNot coded
Direct Harm Domains (content safety harms) > Hate and toxicity
1.2 Exposure to toxic contentNot codedNot codedNot coded
Direct Harm Domains (content safety harms) > Sexual content
1.2 Exposure to toxic contentNot codedNot codedNot coded
Direct Harm Domains (content safety harms) > Child harm
1.2 Exposure to toxic contentNot codedNot codedNot coded
Direct Harm Domains (content safety harms) > Self-harm
1.2 Exposure to toxic contentNot codedNot codedNot coded