BackHallucinations

Hallucinations

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

Cui et al. (2024)

Sub-categories (5)

Knowledge Gaps

"Since the training corpora of LLMs can not contain all possible world knowledge [114]–[119], and it is challenging for LLMs to grasp the long-tail knowledge within their training data [120], [121], LLMs inherently possess knowledge boundaries [107]. Therefore, the gap between knowledge involved in an input prompt and knowledge embedded in the LLMs can lead to hallucinations"

3.1 False or misleading information

AI systemUnintentionalOther

Noisy Training Data

"Another important source of hallucinations is the noise in training data, which introduces errors in the knowledge stored in model parameters [111]–[113]. Generally, the training data inherently harbors misinformation. When training on large-scale corpora, this issue becomes more serious because it is difficult to eliminate all the noise from the massive pre-training data."

3.1 False or misleading information

AI systemUnintentionalPre-deployment

Defective Decoding Process

In general, LLMs employ the Transformer architecture [32] and generate content in an autoregressive manner, where the prediction of the next token is conditioned on the previously generated token sequence. Such a scheme could accumulate errors [105]. Besides, during the decoding process, top-p sampling [28] and top-k sampling [27] are widely adopted to enhance the diversity of the generated content. Nevertheless, these sampling strategies can introduce “randomness” [113], [136], thereby increasing the potential of hallucinations"

3.1 False or misleading information

AI systemUnintentionalPre-deployment

False Recall of Memorized Information

"Although LLMs indeed memorize the queried knowledge, they may fail to recall the corresponding information [122]. That is because LLMs can be confused by co-occurance patterns [123], positional patterns [124], duplicated data [125]–[127] and similar named entities [113]."

3.1 False or misleading information

AI systemUnintentionalOther

Pursuing Consistent Context

"LLMs have been demonstrated to pursue consistent context [129]–[132], which may lead to erroneous generation when the prefixes contain false information. Typical examples include sycophancy [129], [130], false demonstrations-induced hallucinations [113], [133], and snowballing [131]. As LLMs are generally fine-tuned with instruction-following data and user feedback, they tend to reiterate user-provided opinions [129], [130], even though the opinions contain misinformation. Such a sycophantic behavior amplifies the likelihood of generating hallucinations, since the model may prioritize user opinions over facts."

3.1 False or misleading information

OtherUnintentionalPost-deployment