Harmful Content
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
"The LLM-generated content sometimes contains biased, toxic, and private information"(p. 4)
Sub-categories (3)
Bias
"The training datasets of LLMs may contain biased information that leads LLMs to generate outputs with social biases"
1.1 Unfair discrimination and misrepresentationToxicity
"Toxicity means the generated content contains rude, disrespectful, and even illegal information"
1.2 Exposure to toxic contentPrivacy Leakage
"Privacy Leakage means the generated content includes sensitive personal information"
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationOther risks from Cui et al. (2024) (49)
Untruthful Content
3.1 False or misleading informationUntruthful Content > Factuality Errors
3.1 False or misleading informationUntruthful Content > Faithfulness Errors
3.1 False or misleading informationUnhelpful Uses
4.3 Fraud, scams, and targeted manipulationUnhelpful Uses > Academic Misconduct
4.3 Fraud, scams, and targeted manipulationUnhelpful Uses > Copyright Violation
6.3 Economic and cultural devaluation of human effort