Skip to main content
This is a research prototype. The data and analyses are preliminary and not yet validated — we'd welcome your .

Data contamination

AI Risk Atlas

IBM (2025)

Sub-category
Risk Domain

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

"Data contamination occurs when incorrect data is used for training. For example, data that is not aligned with model’s purpose or data that is already set aside for other development tasks such as testing and evaluation."

Supporting Evidence (1)

1.
"Data that differs from the intended training data might skew model accuracy and affect model outcomes."

Other risks from IBM (2025) (63)