BackNot-Suitable-for-Work (NSFW) Prompts
Not-Suitable-for-Work (NSFW) Prompts
Risk Domain
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
"Inputting a prompt contain an unsafe topic (e.g., notsuitable-for-work (NSFW) content) by a benign user. "(p. 4)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Other risks from Cui et al. (2024) (49)
Harmful Content
1.2 Exposure to toxic contentAI systemUnintentionalPost-deployment
Harmful Content > Bias
1.1 Unfair discrimination and misrepresentationAI systemUnintentionalOther
Harmful Content > Toxicity
1.2 Exposure to toxic contentAI systemUnintentionalPost-deployment
Harmful Content > Privacy Leakage
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationAI systemUnintentionalPost-deployment
Untruthful Content
3.1 False or misleading informationAI systemUnintentionalPost-deployment
Untruthful Content > Factuality Errors
3.1 False or misleading informationAI systemUnintentionalPost-deployment