BackLack of training data transparency
Lack of training data transparency
Risk Domain
Inadequate regulatory frameworks and oversight mechanisms that fail to keep pace with AI development, leading to ineffective governance and the inability to manage AI risks appropriately.
"Without accurate documentation on how a model's data was collected, curated, and used to train a model, it might be harder to satisfactorily explain the behavior of the model with respect to the data."
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Supporting Evidence (1)
1.
"A lack of data documentation limits the ability to evaluate risks associated with the data. Having access to the training data is not enough. Without recording how the data was cleaned, modified, or generated, the model behavior is more difficult to understand and to fix. Lack of data transparency also impacts model reuse as it is difficult to determine data representativeness for the new use without such documentation."
Other risks from IBM2025 (63)
Uncertain data provenance
6.5 Governance failureHumanOtherPre-deployment
Data usage restrictions
7.3 Lack of capability or robustnessHumanUnintentionalPre-deployment
Data acquisition restrictions
7.3 Lack of capability or robustnessHumanUnintentionalPre-deployment
Data transfer restrictions
7.3 Lack of capability or robustnessHumanUnintentionalPre-deployment
Personal information in data
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationAI systemUnintentionalPost-deployment
Reidentification
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationOtherUnintentionalPre-deployment