Lower performance for some languages and social groups

Ethical and social risks of harm from language models

Weidinger et al. (2021)

Source DOI

Sub-category

Risk Domain

1Discrimination & Toxicity

1.3Unequal performance across groups

Accuracy and effectiveness of AI decisions and actions are dependent on group membership, where decisions in AI system design and biased training data lead to unequal outcomes, reduced benefits, increased effort, and alienation of users.

"LMs perform less well in some languages (Joshi et al., 2021; Ruder, 2020)...LM that more accurately captures the language use of one group, compared to another, may result in lower-quality language technologies for the latter. Disadvantaging users based on such traits may be particularly pernicious because attributes such as social class or education background are not typically covered as ‘protected characteristics’ in anti-discrimination law."(p. 16)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (1)

"Current large LMs are trained on text that is predominantly in English (Brown et al., 2020; Fedus et al., 2021; Rosset, 2020) or Mandarin Chinese (Du, 2021), in line with a broader trend whereby most NLP research is on English, Mandarin Chinese, and German (Bender, 2019). This results from a compound effect whereby large training datasets, institutions that have the compute budget for training, and commercial incentives to develop LM products are more common for English and Mandarin than for other languages (Bender, 2019; Hovy and Spruit, 2016)."(p. 17)

Part of Discrimination, Exclusion and Toxicity