Risk area 1: Discrimination, Hate speech and Exclusion
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
"Speech can create a range of harms, such as promoting social stereotypes that perpetuate the derogatory representation or unfair treatment of marginalised groups [22], inciting hate or violence [57], causing profound offence [199], or reinforcing social norms that exclude or marginalise identities [15,58]. LMs that faithfully mirror harmful language present in the training data can reproduce these harms. Unfair treatment can also emerge from LMs that perform better for some social groups than others [18]. These risks have been widely known, observed and documented in LMs. Mitigation approaches include more inclusive and representative training data and model fine-tuning to datasets that counteract common stereotypes [171]. We now explore these risks in turn."(p. 216)
Sub-categories (4)
Social stereotypes and unfair discrimination
"The reproduction of harmful stereotypes is well-documented in models that represent natural language [32]. Large-scale LMs are trained on text sources, such as digitised books and text on the internet. As a result, the LMs learn demeaning language and stereotypes about groups who are frequently marginalised."
1.1 Unfair discrimination and misrepresentationHate speech and offensive language
"LMs may generate language that includes profanities, identity attacks, insults, threats, language that incites violence, or language that causes justified offence as such language is prominent online [57, 64, 143,191]. This language risks causing offence, psychological harm, and inciting hate or violence."
1.2 Exposure to toxic contentExclusionary norms
"In language, humans express social categories and norms, which exclude groups who live outside of them [58]. LMs that faithfully encode patterns present in language necessarily encode such norms."
1.1 Unfair discrimination and misrepresentationLower performance for some languages and social groups
"LMs are typically trained in few languages, and perform less well in other languages [95, 162]. In part, this is due to unavailability of training data: there are many widely spoken languages for which no systematic efforts have been made to create labelled training datasets, such as Javanese which is spoken by more than 80 million people [95]. Training data is particularly missing for languages that are spoken by groups who are multilingual and can use a technology in English, or for languages spoken by groups who are not the primary target demographic for new technologies."
1.3 Unequal performance across groupsOther risks from Weidinger et al. (2022) (25)
Risk area 2: Information Hazards
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationRisk area 2: Information Hazards > Compromising privacy by leaking sensitive information
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationRisk area 2: Information Hazards > Compromising privacy or security by correctly inferring sensitive information
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationRisk area 3: Misinformation Harms
3.0 MisinformationRisk area 3: Misinformation Harms > Disseminating false or misleading information
3.1 False or misleading informationRisk area 3: Misinformation Harms > Causing material harm by disseminating false or poor information e.g. in medicine or law
3.1 False or misleading information