Skip to main content
BackRisk area 1: Discrimination, Hate speech and Exclusion
Home/Risks/Weidinger et al. (2022)/Risk area 1: Discrimination, Hate speech and Exclusion

Risk area 1: Discrimination, Hate speech and Exclusion

Taxonomy of Risks posed by Language Models

Weidinger et al. (2022)

Category
Risk Domain

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

"Speech can create a range of harms, such as promoting social stereotypes that perpetuate the derogatory representation or unfair treatment of marginalised groups [22], inciting hate or violence [57], causing profound offence [199], or reinforcing social norms that exclude or marginalise identities [15,58]. LMs that faithfully mirror harmful language present in the training data can reproduce these harms. Unfair treatment can also emerge from LMs that perform better for some social groups than others [18]. These risks have been widely known, observed and documented in LMs. Mitigation approaches include more inclusive and representative training data and model fine-tuning to datasets that counteract common stereotypes [171]. We now explore these risks in turn."(p. 216)

Sub-categories (4)

Social stereotypes and unfair discrimination

"The reproduction of harmful stereotypes is well-documented in models that represent natural language [32]. Large-scale LMs are trained on text sources, such as digitised books and text on the internet. As a result, the LMs learn demeaning language and stereotypes about groups who are frequently marginalised."

1.1 Unfair discrimination and misrepresentation
AI systemUnintentionalOther

Hate speech and offensive language

"LMs may generate language that includes profanities, identity attacks, insults, threats, language that incites violence, or language that causes justified offence as such language is prominent online [57, 64, 143,191]. This language risks causing offence, psychological harm, and inciting hate or violence."

1.2 Exposure to toxic content
AI systemUnintentionalPost-deployment

Exclusionary norms

"In language, humans express social categories and norms, which exclude groups who live outside of them [58]. LMs that faithfully encode patterns present in language necessarily encode such norms."

1.1 Unfair discrimination and misrepresentation
AI systemUnintentionalOther

Lower performance for some languages and social groups

"LMs are typically trained in few languages, and perform less well in other languages [95, 162]. In part, this is due to unavailability of training data: there are many widely spoken languages for which no systematic efforts have been made to create labelled training datasets, such as Javanese which is spoken by more than 80 million people [95]. Training data is particularly missing for languages that are spoken by groups who are multilingual and can use a technology in English, or for languages spoken by groups who are not the primary target demographic for new technologies."

1.3 Unequal performance across groups
AI systemUnintentionalPost-deployment

Other risks from Weidinger et al. (2022) (25)