Harms of Representation and Other Biases
Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.
"A pretrained LLM generally has many of the stereotypical biases commonly present in the human society (Touvron et al., 2023). This makes it difficult for users to trust that LLMs will work well for them and not produce unfair or biased responses. Appropriate finetuning can effectively limit the bias displayed in LLM outputs in a variety of situations, e.g. when models are explicitly prompted with stereotypes (Wang et al., 2023k), but it does not ‘solve’ the problem. Even after finetuning, biases often resurface when deliberately elicited (Wang et al., 2023k), or under novel scenarios, e.g. in writing reference letters (Wan et al., 2023a), generating synthetic training data (Yu et al., 2023c), screening resumes (Yin et al., 2024) or when used as LLM-agents (Pan et al., 2024)."(p. 90)
Supporting Evidence (1)
"The biases outputs are often much more prominent in low-resource languages (Yong et al., 2023) and in dialects used by marginalized groups (Hofmann et al., 2024). There is a need for research to develop better and more comprehensive tools for the detection of bias, toxicity (Wen et al., 2023; Wang and Chang, 2022), and other kinds of inappropriate behaviors. The current tools primarily focus on detecting content that is explicitly toxic and offensive, however, the issue of bias goes beyond commonly studied subjects of biases, such as race and gender (Hofmann et al., 2024). For instance, depending on the finetuning data, LLMs may also develop a bias towards particular political ideologies (Rutinowski et al., 2023). These biases are still relatively poorly understood, and there is a need for more extensive evaluations of current LLMs in novel scenarios to better understand the propensity of LLMs to enact such biases."(p. 90)
Part of Vulnerability to Poisoning and Backdoors
Other risks from Anwar et al. (2024) (26)
Agentic LLMs Pose Novel Risks
7.2 AI possessing dangerous capabilitiesMulti-Agent Safety Is Not Assured by Single-Agent Safety
7.6 Multi-agent risksDual-Use Capabilities Enable Malicious Use and Misuse of LLMs
4.0 Malicious Actors & MisuseCorporate power may impeded effective governance
6.1 Power centralization and unfair distribution of benefitsJailbreaks and Prompt Injections Threaten Security of LLMs
2.2 AI system security vulnerabilities and attacksVulnerability to Poisoning and Backdoors
2.2 AI system security vulnerabilities and attacks