Toxicity and Bias Tendencies
Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.
"Extensive data collection in LLMs brings toxic content and stereotypical bias into the training data."(p. 4)
Sub-categories (2)
Toxic Training Data
"Following previous studies [96], [97], toxic data in LLMs is defined as rude, disrespectful, or unreasonable language that is opposite to a polite, positive, and healthy language environment, including hate speech, offensive utterance, profanities, and threats [91]."
1.2 Exposure to toxic contentBiased Training Data
"Compared with the definition of toxicity, the definition of bias is more subjective and contextdependent. Based on previous work [97], [101], we describe the bias as disparities that could raise demographic differences among various groups, which may involve demographic word prevalence and stereotypical contents. Concretely, in massive corpora, the prevalence of different pronouns and identities could influence an LLM’s tendency about gender, nationality, race, religion, and culture [4]. For instance, the pronoun He is over-represented compared with the pronoun She in the training corpora, leading LLMs to learn less context about She and thus generate He with a higher probability [4], [102]. Furthermore, stereotypical bias [103] which refers to overgeneralized beliefs about a particular group of people, usually keeps incorrect values and is hidden in the large-scale benign contents. In effect, defining what should be regarded as a stereotype in the corpora is still an open problem."
1.1 Unfair discrimination and misrepresentationOther risks from Cui et al. (2024) (49)
Harmful Content
1.2 Exposure to toxic contentHarmful Content > Bias
1.1 Unfair discrimination and misrepresentationHarmful Content > Toxicity
1.2 Exposure to toxic contentHarmful Content > Privacy Leakage
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationUntruthful Content
3.1 False or misleading informationUntruthful Content > Factuality Errors
3.1 False or misleading information