Skip to main content

Dataset shift

Towards risk-aware artificial intelligence and machine learning systems: An overview

Zhang et al. (2022)

Sub-category
Risk Domain

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

"The term "dataset shift" was first used by Quiñonero-Candela et al. [35] to characterize the situation where the training data and the testing data (or data in runtime) of an AI/ML model demonstrate different distributions [36]."(p. 3)

Supporting Evidence (3)

1.
"Covariate shift: when training AI/ML models, people typically assume that the training data and the testing data follow the same probability distribution [40,41]. However, this common assumption is usually violated in many real-world applications [42], especially in dynamic environments."(p. 3)
2.
"Prior probability shift centers on the change associated with the probability distribution of Y [43,44]. Mathematically, prior probability shift can be characterized as ptrain(Y) ≠ ptest(Y). Basically, prior probability shift refers to the situation where the training data and testing data differ in the distribution of Y."(p. 3)
3.
"Concept shift, which is often referred to as "concept drift", characterizes the situation in which the underlying relationship between X and Y changes in non-stationary environments [47,48]. Mathematically, concept shift is represented as ptrain(Y(p. 3)

Other risks from Zhang et al. (2022) (6)