BackOffensiveness
Offensiveness
Risk Domain
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
"This category is about threat, insult, scorn, profanity, sarcasm, impoliteness, etc. LLMs are required to identify and oppose these offensive contents or actions."(p. 3)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Other risks from Zhang et al. (2023) (6)
Unfairness and Bias
1.0 Discrimination & ToxicityAI systemOtherPost-deployment
Physical Health
3.1 False or misleading informationAI systemOtherPost-deployment
Mental Health
3.1 False or misleading informationAI systemOtherPost-deployment
Illegal Activities
4.3 Fraud, scams, and targeted manipulationAI systemOtherPost-deployment
Ethics and Morality
7.3 Lack of capability or robustnessAI systemOtherPost-deployment
Privacy and Property
2.0 Privacy & SecurityAI systemUnintentionalPost-deployment