Skip to main content
Home/Risks/Gipiškis2024/Biases are not accurately reflected in explanations

Biases are not accurately reflected in explanations

Sub-category
Risk Domain

Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.

"Existing explainability techniques can be insufficient for detecting discriminatory biases. Manipulation methods can hide underlying biases from these tech- niques, generating misleading explanations [192, 112]. Such explanations ex- clude sensitive or prohibitive attributes, such as race or gender, and instead include desired attributes, even though they do not accurately represent the underlying model."(p. 24)

Part of Model Evaluations (Interpretability/Explainability)

Other risks from Gipiškis2024 (144)