Discrimination, Exclusion and Toxicity
"Social harms that arise from the language model producing discriminatory or exclusionary speech"(p. 9)
Sub-categories (4)
Social stereotypes and unfair discrmination
"Perpetuating harmful stereotypes and discrimination is a well-documented harm in machine learning models that represent natural language (Caliskan et al., 2017). LMs that encode discriminatory language or social stereotypes can cause different types of harm... Unfair discrimination manifests in differential treatment or access to resources among individuals or groups based on sensitive traits such as sex, religion, gender, sexual orientation, ability and age."
1.1 Unfair discrimination and misrepresentationExclusionary norms
"In language, humans express social categories and norms. Language models (LMs) that faithfully encode patterns present in natural language necessarily encode such norms and categories...such norms and categories exclude groups who live outside them (Foucault and Sheridan, 2012). For example, defining the term “family” as married parents of male and female gender with a blood-related child, denies the existence of families to whom these criteria do not apply"
1.1 Unfair discrimination and misrepresentationToxic language
"LM’s may predict hate speech or other language that is “toxic”. While there is no single agreed definition of what constitutes hate speech or toxic speech (Fortuna and Nunes, 2018; Persily and Tucker, 2020; Schmidt and Wiegand, 2017), proposed definitions often include profanities, identity attacks, sleights, insults, threats, sexually explicit content, demeaning language, language that incites violence, or ‘hostile and malicious language targeted at a person or group because of their actual or perceived innate characteristics’ (Fortuna and Nunes, 2018; Gorwa et al., 2020; PerspectiveAPI)"
1.2 Exposure to toxic contentLower performance for some languages and social groups
"LMs perform less well in some languages (Joshi et al., 2021; Ruder, 2020)...LM that more accurately captures the language use of one group, compared to another, may result in lower-quality language technologies for the latter. Disadvantaging users based on such traits may be particularly pernicious because attributes such as social class or education background are not typically covered as ‘protected characteristics’ in anti-discrimination law."
1.3 Unequal performance across groupsOther risks from Weidinger et al. (2021) (26)
Information Hazards
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationInformation Hazards > Compromising privacy by leaking private infiormation
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationInformation Hazards > Compromising privacy by correctly inferring private information
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationInformation Hazards > Risks from leaking or correctly inferring sensitive information
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationMisinformation Harms
3.0 MisinformationMisinformation Harms > Disseminating false or misleading information
3.1 False or misleading information