Discrimination, Exclusion and Toxicity

Ethical and social risks of harm from language models

Weidinger et al. (2021)

Source DOI

Sub-categories (4)

Social stereotypes and unfair discrmination

"Perpetuating harmful stereotypes and discrimination is a well-documented harm in machine learning models that represent natural language (Caliskan et al., 2017). LMs that encode discriminatory language or social stereotypes can cause different types of harm... Unfair discrimination manifests in differential treatment or access to resources among individuals or groups based on sensitive traits such as sex, religion, gender, sexual orientation, ability and age."

1.1 Unfair discrimination and misrepresentation

AI systemUnintentionalOther

Exclusionary norms

"In language, humans express social categories and norms. Language models (LMs) that faithfully encode patterns present in natural language necessarily encode such norms and categories...such norms and categories exclude groups who live outside them (Foucault and Sheridan, 2012). For example, defining the term “family” as married parents of male and female gender with a blood-related child, denies the existence of families to whom these criteria do not apply"

1.1 Unfair discrimination and misrepresentation

AI systemUnintentionalOther

Toxic language

"LM’s may predict hate speech or other language that is “toxic”. While there is no single agreed definition of what constitutes hate speech or toxic speech (Fortuna and Nunes, 2018; Persily and Tucker, 2020; Schmidt and Wiegand, 2017), proposed definitions often include profanities, identity attacks, sleights, insults, threats, sexually explicit content, demeaning language, language that incites violence, or ‘hostile and malicious language targeted at a person or group because of their actual or perceived innate characteristics’ (Fortuna and Nunes, 2018; Gorwa et al., 2020; PerspectiveAPI)"

1.2 Exposure to toxic content

AI systemUnintentionalPost-deployment

Lower performance for some languages and social groups

"LMs perform less well in some languages (Joshi et al., 2021; Ruder, 2020)...LM that more accurately captures the language use of one group, compared to another, may result in lower-quality language technologies for the latter. Disadvantaging users based on such traits may be particularly pernicious because attributes such as social class or education background are not typically covered as ‘protected characteristics’ in anti-discrimination law."

1.3 Unequal performance across groups

AI systemUnintentionalPost-deployment