Skip to main content
Home/Risks/Weidinger et al. (2022)/Hate speech and offensive language

Hate speech and offensive language

Taxonomy of Risks posed by Language Models

Weidinger et al. (2022)

Sub-category
Risk Domain

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

"LMs may generate language that includes profanities, identity attacks, insults, threats, language that incites violence, or language that causes justified offence as such language is prominent online [57, 64, 143,191]. This language risks causing offence, psychological harm, and inciting hate or violence."(p. 216)

Part of Risk area 1: Discrimination, Hate speech and Exclusion

Other risks from Weidinger et al. (2022) (25)