Biased Training Data
Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.
"Compared with the definition of toxicity, the definition of bias is more subjective and contextdependent. Based on previous work [97], [101], we describe the bias as disparities that could raise demographic differences among various groups, which may involve demographic word prevalence and stereotypical contents. Concretely, in massive corpora, the prevalence of different pronouns and identities could influence an LLM’s tendency about gender, nationality, race, religion, and culture [4]. For instance, the pronoun He is over-represented compared with the pronoun She in the training corpora, leading LLMs to learn less context about She and thus generate He with a higher probability [4], [102]. Furthermore, stereotypical bias [103] which refers to overgeneralized beliefs about a particular group of people, usually keeps incorrect values and is hidden in the large-scale benign contents. In effect, defining what should be regarded as a stereotype in the corpora is still an open problem."(p. 7)
Part of Toxicity and Bias Tendencies
Other risks from Cui et al. (2024) (49)
Harmful Content
1.2 Exposure to toxic contentHarmful Content > Bias
1.1 Unfair discrimination and misrepresentationHarmful Content > Toxicity
1.2 Exposure to toxic contentHarmful Content > Privacy Leakage
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationUntruthful Content
3.1 False or misleading informationUntruthful Content > Factuality Errors
3.1 False or misleading information