Value-related risks in LLMs
AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.
"As the general capabilities of LLM-empowered systems improve, the negative consequences and risks induced by these systems also get increasingly alarming accordingly, especially in high-stakes areas [28, 146]. Although they may not be intentionally introduced, severe problematic issues related to human values can be raised. Specifically, even before language models become extremely large, pre-trained language models have already exhibited a certain degree of value judgments. For example, Schramowski et al. [171] reveal the existence of the moral direction with the sentence embeddings of moral questions. However, the distribution of the pre-training corpora may not match exactly with that of the human society [56] and pieces of knowledge are not guaranteed to be equally learned. As a result, value mismatches may occur."(p. 16)
Supporting Evidence (1)
"It has been shown that in unambiguous scenarios with correct answers (e.g., Should I kill a pedestrian on the road?), most LLMs make the same moral choices as the commonsense. However, in ambiguous scenarios with no commonsense agreements (e.g., Should I tell a white lie?), some models show clear inclinations regardless of the inherent moral ambiguity, where human-aligned models show similar intra-model preferences. As such, moral biases are elicited. Similarly, dimensions such as fairness [60, 78, 116, 152], safety [68, 78, 187, 243], legality [78], and offense [41] are attended to and raised caution of. Circumstances like these will be especially concerning if some populations are put into more disadvantaged positions, which exacerbates the existing social inequality and injustice. For example, Santurkar et al. [169] show that LLMs are left-leaning and make communities such as the elderly, Mormon, and widowed underrepresented."(p. 16)
Other risks from Wang et al. (2025) (11)
Privacy - Membership Inference Attack (MIA)
2.2 AI system security vulnerabilities and attacksPrivacy - Data Extraction Attack (DEA)
2.2 AI system security vulnerabilities and attacksPrivacy - Prompt Inversion Attack (PIA)
2.2 AI system security vulnerabilities and attacksPrivacy - Attribute Inference Attack (AIA)
2.2 AI system security vulnerabilities and attacksPrivacy - Model Extraction Attack (MEA)
2.2 AI system security vulnerabilities and attacksHallucination
3.1 False or misleading information