BackExclusionary norms

Exclusionary norms

Taxonomy of Risks posed by Language Models

Weidinger et al. (2022)

Sub-category

Risk Domain

1.1Unfair discrimination and misrepresentation

Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.

"In language, humans express social categories and norms, which exclude groups who live outside of them [58]. LMs that faithfully encode patterns present in language necessarily encode such norms."(p. 216)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (4)

"Exclusionary norms can manifest in “subtle patterns like referring to women doctors as if doctor itself entails not-woman” [15], emphasis added."(p. 216)

"Where a LM omits, excludes, or subsumes those deviating from a norm into ill-fitting categories, affected individuals may also encounter allocational or representational harm [100, 159]. Exclusionary norms can place a disproportionate burden or “psychological tax” on those who do not comply with these norms or who are trying to change them."(p. 217)

"A LM trained on language data at a particular moment in time risks excluding some groups and creating a “frozen moment” whereby temporary societal arrangements are enshrined in a model without the capacity to update the technology as society develops [70]. The risk, in this case, is that LMs come to represent language from a particular community and point in time, so that the norms, values, categories from that moment get “locked in” [15, 59]."(p. 217)

"Rare entities can become marginalised due to a ‘com- mon token bias’, whereby the LM frequently provides common but false terms in response to a question rather than providing the less common, correct response. For example, GPT-3 was found to ‘often predict common entities such as “America” when the ground- truth answer is instead a rare entity in the training data’, such as Keetmansoop, Namibia [206].1"(p. 217)

Part of Risk area 1: Discrimination, Hate speech and Exclusion

Other risks from Weidinger et al. (2022) (25)

Risk area 1: Discrimination, Hate speech and Exclusion

1.2 Exposure to toxic content

AI systemUnintentionalOther

Risk area 1: Discrimination, Hate speech and Exclusion > Social stereotypes and unfair discrimination

1.1 Unfair discrimination and misrepresentation

AI systemUnintentionalOther

Risk area 1: Discrimination, Hate speech and Exclusion > Hate speech and offensive language

1.2 Exposure to toxic content

AI systemUnintentionalPost-deployment

Risk area 1: Discrimination, Hate speech and Exclusion > Lower performance for some languages and social groups

1.3 Unequal performance across groups

AI systemUnintentionalPost-deployment

Risk area 2: Information Hazards

2.1 Compromise of privacy by leaking or correctly inferring sensitive information

AI systemUnintentionalPost-deployment

Risk area 2: Information Hazards > Compromising privacy by leaking sensitive information

2.1 Compromise of privacy by leaking or correctly inferring sensitive information

AI systemUnintentionalPost-deployment

View all 25 risks from this paper →