BackUncertain data provenance
Uncertain data provenance
Risk Domain
Inadequate regulatory frameworks and oversight mechanisms that fail to keep pace with AI development, leading to ineffective governance and the inability to manage AI risks appropriately.
"Data provenance refers to tracing history of data, which includes its ownership, origin, and transformations. Without standardized and established methods for verifying where the data came from, there are no guarantees that the data is the same as the original source and has the correct usage terms."
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Supporting Evidence (1)
1.
"Not all data sources are trustworthy. Data might be unethically collected, manipulated, or falsified. Verifying that data provenance is challenging due to factors such as data volume, data complexity, data source varieties, and poor data management. Using such data can result in undesirable behaviors in the model."
Other risks from IBM2025 (63)
Lack of training data transparency
6.5 Governance failureHumanUnintentionalPre-deployment
Data usage restrictions
7.3 Lack of capability or robustnessHumanUnintentionalPre-deployment
Data acquisition restrictions
7.3 Lack of capability or robustnessHumanUnintentionalPre-deployment
Data transfer restrictions
7.3 Lack of capability or robustnessHumanUnintentionalPre-deployment
Personal information in data
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationAI systemUnintentionalPost-deployment
Reidentification
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationOtherUnintentionalPre-deployment