BackRisk area 1: Discrimination, Hate speech and Exclusion

Social stereotypes and unfair discrimina…

Home/Risks/Weidinger et al. (2022)/Risk area 1: Discrimination, Hate speech and Exclusion

Other ethical risks

Social stereotypes and unfair discrimina…

Home/Risks/Weidinger et al. (2022)/Risk area 1: Discrimination, Hate speech and Exclusion

Other ethical risks

Social stereotypes and unfair discrimina…

Risk area 1: Discrimination, Hate speech and Exclusion

Taxonomy of Risks posed by Language Models

Weidinger et al. (2022)

Source DOI

Category

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

"Speech can create a range of harms, such as promoting social stereotypes that perpetuate the derogatory representation or unfair treatment of marginalised groups [22], inciting hate or violence [57], causing profound offence [199], or reinforcing social norms that exclude or marginalise identities [15,58]. LMs that faithfully mirror harmful language present in the training data can reproduce these harms. Unfair treatment can also emerge from LMs that perform better for some social groups than others [18]. These risks have been widely known, observed and documented in LMs. Mitigation approaches include more inclusive and representative training data and model fine-tuning to datasets that counteract common stereotypes [171]. We now explore these risks in turn."(p. 216)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Sub-categories (4)

Social stereotypes and unfair discrimination

"The reproduction of harmful stereotypes is well-documented in models that represent natural language [32]. Large-scale LMs are trained on text sources, such as digitised books and text on the internet. As a result, the LMs learn demeaning language and stereotypes about groups who are frequently marginalised."

1.1 Unfair discrimination and misrepresentation

AI systemUnintentionalOther

Hate speech and offensive language

"LMs may generate language that includes profanities, identity attacks, insults, threats, language that incites violence, or language that causes justified offence as such language is prominent online [57, 64, 143,191]. This language risks causing offence, psychological harm, and inciting hate or violence."

1.2 Exposure to toxic content

AI systemUnintentionalPost-deployment

Exclusionary norms

"In language, humans express social categories and norms, which exclude groups who live outside of them [58]. LMs that faithfully encode patterns present in language necessarily encode such norms."

1.1 Unfair discrimination and misrepresentation

AI systemUnintentionalOther

Lower performance for some languages and social groups

"LMs are typically trained in few languages, and perform less well in other languages [95, 162]. In part, this is due to unavailability of training data: there are many widely spoken languages for which no systematic efforts have been made to create labelled training datasets, such as Javanese which is spoken by more than 80 million people [95]. Training data is particularly missing for languages that are spoken by groups who are multilingual and can use a technology in English, or for languages spoken by groups who are not the primary target demographic for new technologies."

1.3 Unequal performance across groups

AI systemUnintentionalPost-deployment

Other risks from Weidinger et al. (2022) (25)

Risk area 2: Information Hazards

2.1 Compromise of privacy by leaking or correctly inferring sensitive information

AI systemUnintentionalPost-deployment

Risk area 2: Information Hazards > Compromising privacy by leaking sensitive information

2.1 Compromise of privacy by leaking or correctly inferring sensitive information

UnintentionalPost-deployment

Risk area 2: Information Hazards > Compromising privacy or security by correctly inferring sensitive information

2.1 Compromise of privacy by leaking or correctly inferring sensitive information

AI systemUnintentionalPost-deployment

Risk area 3: Misinformation Harms

3.0 Misinformation

AI systemUnintentionalPost-deployment

Risk area 3: Misinformation Harms > Disseminating false or misleading information

3.1 False or misleading information

AI systemUnintentionalPost-deployment

Risk area 3: Misinformation Harms > Causing material harm by disseminating false or poor information e.g. in medicine or law

3.1 False or misleading information

AI systemUnintentionalPost-deployment

View all 25 risks from this paper →