Skip to main content
Home/Risks/Cui et al. (2024)/Noisy Training Data

Noisy Training Data

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

Cui et al. (2024)

Sub-category
Risk Domain

AI systems that inadvertently generate or spread incorrect or deceptive information, which can lead to inaccurate beliefs in users and undermine their autonomy. Humans that make decisions based on false beliefs can experience physical, emotional or material harms

"Another important source of hallucinations is the noise in training data, which introduces errors in the knowledge stored in model parameters [111]–[113]. Generally, the training data inherently harbors misinformation. When training on large-scale corpora, this issue becomes more serious because it is difficult to eliminate all the noise from the massive pre-training data."(p. 8)

Part of Hallucinations

Other risks from Cui et al. (2024) (49)