BackToxicity
Toxicity
Risk Domain
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
language being rude, disrespectful, threatening, or identity-attacking toward certain groups of the user population (culture, race, and gender etc)(p. 25)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Supporting Evidence (1)
1.
in the training dataset of LLMs can contain a non-negligible portion of toxic comments(p. 25)
Part of Social Norm
Other risks from Liu et al. (2024) (34)
Reliability
3.1 False or misleading informationAI systemUnintentionalPost-deployment
Reliability > Misinformation
3.1 False or misleading informationAI systemUnintentionalPost-deployment
Reliability > Hallucination
3.1 False or misleading informationAI systemUnintentionalPost-deployment
Reliability > Inconsistency
7.3 Lack of capability or robustnessAI systemUnintentionalPost-deployment
Reliability > Miscalibration
3.1 False or misleading informationAI systemUnintentionalPost-deployment
Reliability > Sychopancy
3.1 False or misleading informationAI systemIntentionalPost-deployment