BackSychopancy
Sychopancy
Risk Domain
AI systems that inadvertently generate or spread incorrect or deceptive information, which can lead to inaccurate beliefs in users and undermine their autonomy. Humans that make decisions based on false beliefs can experience physical, emotional or material harms
flatter users by reconfirming their misconceptions and stated beliefs(p. 13)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Supporting Evidence (2)
1.
In contrast to the overconfidence problem discussed in Section 4.4, in this case, the model tends to confirm users’ stated beliefs, and might even encourage certain actions despite the ethical or legal harm(p. 13)
2.
It can also be attributed to sometimes excessive instructions for the LLM to be helpful and not offend human users. In addition, it is possible that the RLHF stage could promote and enforce confirmation with human users. During the alignment, LLMs are fed with “friendly" examples that can be interpreted as being sycophantic to human user(p. 13)
Part of Reliability
Other risks from Liu et al. (2024) (34)
Reliability
3.1 False or misleading informationAI systemUnintentionalPost-deployment
Reliability > Misinformation
3.1 False or misleading informationAI systemUnintentionalPost-deployment
Reliability > Hallucination
3.1 False or misleading informationAI systemUnintentionalPost-deployment
Reliability > Inconsistency
7.3 Lack of capability or robustnessAI systemUnintentionalPost-deployment
Reliability > Miscalibration
3.1 False or misleading informationAI systemUnintentionalPost-deployment
Safety
1.2 Exposure to toxic contentAI systemOtherPost-deployment