Fine-tuning related (Excessive or overly restrictive safety-tuning)
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
"Excessive safety training or safety tuning can impair the performance of AI systems, leading to overly cautious behavior. As a result, these systems may refuse to answer entirely safe prompts which are partially similar to harmful ones [27]."(p. 15)
Human
Due to a decision or action made by humans
AI system
Due to a decision or action made by an AI system
Other
Due to some other reason or is ambiguous
Not coded
Intentional
Due to an expected outcome from pursuing a goal
Unintentional
Due to an unexpected outcome from pursuing a goal
Other
Without clearly specifying the intentionality
Not coded
Pre-deployment
Occurring before the AI is deployed
Post-deployment
Occurring after the AI model has been trained and deployed
Other
Without a clearly specified time of occurrence
Not coded
Other risks from Gipiškis2024 (144)
Direct Harm Domains (content safety harms)
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Violence and extremism
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Hate and toxicity
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Sexual content
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Child harm
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Self-harm
1.2 Exposure to toxic content