Skip to main content
BackQuality of training data
Home/Risks/Nah et al. (2023)/Quality of training data

Quality of training data

Generative AI and ChatGPT: Applications, Challenges, and AI-Human Collaboration

Nah et al. (2023)

Sub-category
Risk Domain

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

"The quality of training data is another challenge faced by generative AI. The quality of generative AI models largely depends on the quality of the training data (Dwivedi et al., 2023; Su & Yang, 2023). Any factual errors, unbalanced information sources, or biases embedded in the training data may be reflected in the output of the model. Generative AI models, such as ChatGPT or Stable Diffusion which is a text-to-image model, often require large amounts of training data (Gozalo-Brizuela & Garrido-Merchan, 2023). It is important to not only have high-quality training datasets but also have complete and balanced datasets."(p. 288)

Part of Technology concerns

Other risks from Nah et al. (2023) (17)