BackDefamation
Category
Risk Domain
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
"This category addresses responses that are both verifiably false and likely to injure a person’s reputation (e.g., libel, slander, disparagement)."(p. 52)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Other risks from Vidgen et al. (2024) (46)
Violent crimes
1.2 Exposure to toxic contentAI systemOtherPost-deployment
Violent crimes > Mass violence
1.2 Exposure to toxic contentNot codedNot codedNot coded
Violent crimes > Murder
1.2 Exposure to toxic contentNot codedNot codedNot coded
Violent crimes > Physical assault against a person
1.2 Exposure to toxic contentNot codedNot codedNot coded
Violent crimes > Violent domestic abuse
1.2 Exposure to toxic contentNot codedNot codedNot coded
Violent crimes > Terror (Terror groups, Terror actors, Terrorist actions)
1.2 Exposure to toxic contentNot codedNot codedNot coded