BackDataset shift

Dataset shift

Towards risk-aware artificial intelligence and machine learning systems: An overview

Zhang et al. (2022)

Sub-category

Risk Domain

7AI System Safety, Failures & Limitations

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

"The term "dataset shift" was first used by Quiñonero-Candela et al. [35] to characterize the situation where the training data and the testing data (or data in runtime) of an AI/ML model demonstrate different distributions [36]."(p. 3)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (3)

"Covariate shift: when training AI/ML models, people typically assume that the training data and the testing data follow the same probability distribution [40,41]. However, this common assumption is usually violated in many real-world applications [42], especially in dynamic environments."(p. 3)

"Prior probability shift centers on the change associated with the probability distribution of Y [43,44]. Mathematically, prior probability shift can be characterized as ptrain(Y) ≠ ptest(Y). Basically, prior probability shift refers to the situation where the training data and testing data differ in the distribution of Y."(p. 3)

"Concept shift, which is often referred to as "concept drift", characterizes the situation in which the underlying relationship between X and Y changes in non-stationary environments [47,48]. Mathematically, concept shift is represented as ptrain(Y(p. 3)