Skip to main content
Home/Risks/Ji et al. (2023)/Limitations of Human Feedback

Limitations of Human Feedback

AI Alignment: A Comprehensive Survey

Ji et al. (2023)

Sub-category

"Limitations of Human Feedback. During the training of LLMs, inconsistencies can arise from human dataannotators (e.g., the varied cultural backgrounds of these annotators can introduce implicit biases (Peng et al.,2022)) (OpenAI, 2023a). Moreover, they might even introduce biases deliberately, leading to untruthful preferencedata (Casper et al., 2023b). For complex tasks that are hard for humans to evaluate (e.g., the value ofgame state), these challenges become even more salient (Irving et al., 2018)."(p. 4)

Part of Causes of Misalignment

Other risks from Ji et al. (2023) (16)