Bias, Fairness and Representational Harms

Capabilities and Risks from Frontier AI

DSIT (2023)

Source

Category

Risk Domain

1Discrimination & Toxicity

1.1Unfair discrimination and misrepresentation

Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.

"Frontier AI models can contain and magnify biases ingrained in the data they are trained on, reflecting societal and historical inequalities and stereotypes.177 These biases, often subtle and deeply embedded, compromise the equitable and ethical use of AI systems, making it difficult for AI to improve fairness in decisions.178 Removing attributes like race and gender from training data has generally proven ineffective as a remedy for algorithmic bias, as models can infer these attributes from other information such as names, locations, and other seemingly unrelated factors."(p. 21)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (3)

"Frontier AI models are primarily trained on textual sources, including digitised books and online text. Consequently, they are exposed to derogatory language and stereotypes that target marginalised groups. The training data often mirrors historical patterns of systemic injustice, inequalities in the contexts from which the data is sourced,180 or it reflects dominant cultures (consider high internet-access regions) and lack data on certain worldviews, cultures and languages.181 Frontier AI systems have been found to not only replicate but also to perpetuate the biases ingrained in their training data.182"(p. 22)

"When bias manifests in AI outputs, it can do so in subtle and complex ways.183 Because frontier models lack transparency, it becomes a formidable task to pinpoint the exact mechanisms through which bias has been introduced into their decisions.184 The complex nature of bias makes it challenging to identify and rectify instances of unfairness.185 Individuals may therefore question whether their treatment by an AI system was influenced by their gender, race, or other personal characteristics – without insight into the model's inner workings, it is difficult to find answers."(p. 22)

"It is worth noting that discrimination due to model bias can be seen as a kind of alignment problem: AI systems are behaving in ways that its developers did not intend. This highlights the importance of investing in AI alignment and AI ethics research."(p. 22)