Skip to main content
Home/Risks/Gipiškis2024/Direct Harm Domains (content safety harms)

Direct Harm Domains (content safety harms)

Category
Risk Domain

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

"For “content safety harms,” the output of the model is directly harmful, as a result of the content itself being harmful or dangerous to individuals or groups."(p. 87)

Other risks from Gipiškis2024 (144)