Lower performance for some languages and social groups
Accuracy and effectiveness of AI decisions and actions are dependent on group membership, where decisions in AI system design and biased training data lead to unequal outcomes, reduced benefits, increased effort, and alienation of users.
"LMs perform less well in some languages (Joshi et al., 2021; Ruder, 2020)...LM that more accurately captures the language use of one group, compared to another, may result in lower-quality language technologies for the latter. Disadvantaging users based on such traits may be particularly pernicious because attributes such as social class or education background are not typically covered as ‘protected characteristics’ in anti-discrimination law."(p. 16)
Supporting Evidence (1)
"Current large LMs are trained on text that is predominantly in English (Brown et al., 2020; Fedus et al., 2021; Rosset, 2020) or Mandarin Chinese (Du, 2021), in line with a broader trend whereby most NLP research is on English, Mandarin Chinese, and German (Bender, 2019). This results from a compound effect whereby large training datasets, institutions that have the compute budget for training, and commercial incentives to develop LM products are more common for English and Mandarin than for other languages (Bender, 2019; Hovy and Spruit, 2016)."(p. 17)
Part of Discrimination, Exclusion and Toxicity
Other risks from Weidinger et al. (2021) (26)
Discrimination, Exclusion and Toxicity
1.0 Discrimination & ToxicityDiscrimination, Exclusion and Toxicity > Social stereotypes and unfair discrmination
1.1 Unfair discrimination and misrepresentationDiscrimination, Exclusion and Toxicity > Exclusionary norms
1.1 Unfair discrimination and misrepresentationDiscrimination, Exclusion and Toxicity > Toxic language
1.2 Exposure to toxic contentInformation Hazards
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationInformation Hazards > Compromising privacy by leaking private infiormation
2.1 Compromise of privacy by leaking or correctly inferring sensitive information