Reporting of user-preferred answers instead of correct answers
AI systems that inadvertently generate or spread incorrect or deceptive information, which can lead to inaccurate beliefs in users and undermine their autonomy. Humans that make decisions based on false beliefs can experience physical, emotional or material harms
"AI systems with natural-language outputs can tend to give answers that appear plausible or that users prefer [149] but are factually incorrect. This phenomenon is sometimes referred to as “sycophancy.”"(p. 51)
Supporting Evidence (2)
"This behavior can occur if the AI system is updated after human users give feedback on the outputs of the model, since human feedback has systematic biases which an AI model can learn from. In such a case, the reinforced behavior can favor giving inaccurate but human-preferred answers, where the preference is inferred from cues in the input."(p. 51)
"AI models giving preferred but incorrect answers can also happen if they are configured by model developers to do so, in order to make the resulting product more palpable to consumers. This can also happen if they are trained on data which contain many conversations between people who agree with each other."(p. 51)
Part of Impacts of AI (Bias)
Other risks from Gipiškis2024 (144)
Direct Harm Domains (content safety harms)
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Violence and extremism
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Hate and toxicity
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Sexual content
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Child harm
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Self-harm
1.2 Exposure to toxic content