Skip to main content
This is a research prototype. The data and analyses are preliminary and not yet validated — we'd welcome your .

Benchmarking (Benchmark leakage or data contamination)

Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems

Gipiškis et al. (2024)

Sub-category
Risk Domain

Inadequate regulatory frameworks and oversight mechanisms that fail to keep pace with AI development, leading to ineffective governance and the inability to manage AI risks appropriately.

"Benchmark leakage [235, 224, 221, 161] can happen when an AI model is trained or fine-tuned with evaluation-related data. This can lead to an unreliable model evaluation, especially if the data contains question-answer pairs from bench- marks."(p. 19)

Other risks from Gipiškis et al. (2024) (144)