Skip to main content
Home/Risks/Gipiškis2024/Benchmarking (Annotation contamination)

Benchmarking (Annotation contamination)

Sub-category
Risk Domain

Inadequate regulatory frameworks and oversight mechanisms that fail to keep pace with AI development, leading to ineffective governance and the inability to manage AI risks appropriately.

"Annotation contamination refers to scenarios where the model is exposed to the benchmark labels during training [170]. This type of contamination can make the model learn the acceptable distribution of outputs. Combining this with raw data contamination of the test split, any evaluation made with the benchmark is invalidated because the entire test split is essentially leaked to the model."(p. 19)

Other risks from Gipiškis2024 (144)