Skip to main content
Home/Risks/Ji et al. (2023)/Untruthful Output

Untruthful Output

AI Alignment: A Comprehensive Survey

Ji et al. (2023)

Sub-category
Risk Domain

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

"AI systems such as LLMs can produce either unintentionally or deliberately inaccurateoutput. Such untruthful output may diverge from established resources or lack verifiability, commonly referredto as hallucination (Bang et al., 2023; Zhao et al., 2023). More concerning is the phenomenon wherein LLMsmay selectively provide erroneous responses to users who exhibit lower levels of education (Perez et al.,2023)."(p. 7)

Part of Misaligned Behaviors

Other risks from Ji et al. (2023) (16)