AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
"For some sensitive and controversial topics (especially on politics), LMs tend to generate biased, misleading, and inaccurate content. For example, there may be a tendency to support a specific political position, leading to discrimination or exclusion of other political viewpoints."(p. 3)
Other risks from Sun et al. (2023) (14)
Instruction Attacks
2.2 AI system security vulnerabilities and attacksInstruction Attacks > Goal Hijacking
2.2 AI system security vulnerabilities and attacksInstruction Attacks > Prompt Leaking
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationInstruction Attacks > Role Play Instruction
2.2 AI system security vulnerabilities and attacksInstruction Attacks > Unsafe Instruction Topic
2.2 AI system security vulnerabilities and attacksInstruction Attacks > Inquiry with Unsafe Opinion
2.2 AI system security vulnerabilities and attacks