Safety
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
Avoiding unsafe and illegal outputs, and leaking private information(p. 8)
Sub-categories (5)
Violence
LLMs are found to generate answers that contain violent content or generate content that responds to questions that solicit information about violent behaviors
1.2 Exposure to toxic contentUnlawful Conduct
LLMs have been shown to be a convenient tool for soliciting advice on accessing, purchasing (illegally), and creating illegal substances, as well as for dangerous use of them
1.2 Exposure to toxic contentHarms to Minor
LLMs can be leveraged to solicit answers that contain harmful content to children and youth
1.2 Exposure to toxic contentAdult Content
LLMs have the capability to generate sex-explicit conversations, and erotic texts, and to recommend websites with sexual content
1.2 Exposure to toxic contentPrivacy Violation
machine learning models are known to be vulnerable to data privacy attacks, i.e. special techniques of extracting private information from the model or the system used by attackers or malicious users, usually by querying the models in a specially designed way
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationOther risks from Liu et al. (2024) (34)
Reliability
3.1 False or misleading informationReliability > Misinformation
3.1 False or misleading informationReliability > Hallucination
3.1 False or misleading informationReliability > Inconsistency
7.3 Lack of capability or robustnessReliability > Miscalibration
3.1 False or misleading informationReliability > Sychopancy
3.1 False or misleading information