BackToxic output
Toxic output
Risk Domain
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
"Toxic output occurs when the model produces hateful, abusive, and profane (HAP) or obscene content. This also includes behaviors like bullying."
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Supporting Evidence (1)
1.
"Hateful, abusive, and profane (HAP) or obscene content can adversely impact and harm people interacting with the model."
Other risks from IBM2025 (63)
Lack of training data transparency
6.5 Governance failureHumanUnintentionalPre-deployment
Uncertain data provenance
6.5 Governance failureHumanOtherPre-deployment
Data usage restrictions
7.3 Lack of capability or robustnessHumanUnintentionalPre-deployment
Data acquisition restrictions
7.3 Lack of capability or robustnessHumanUnintentionalPre-deployment
Data transfer restrictions
7.3 Lack of capability or robustnessHumanUnintentionalPre-deployment
Personal information in data
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationAI systemUnintentionalPost-deployment