BackFairness

Fairness

Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models’ Alignment

Liu et al. (2024)

Supporting Evidence (2)

LLMs can favor certain groups of users or ideas, perpetuate stereotypes, or make incorrect assumptions based on extracted statistical patterns(p. 16)

Imbalance in the pretraining data can cause fairness issues during training, leading to disparate performances for different user groups(p. 16)

Sub-categories (4)

Injustice

In the context of LLM outputs, we want to make sure the suggested or completed texts are indistinguishable in nature for two involved individuals (in the prompt) with the same relevant profiles but might come from different groups (where the group attribute is regarded as being irrelevant in this context)

1.1 Unfair discrimination and misrepresentation

AI systemUnintentionalPost-deployment

Stereotype Bias

LLMs must not exhibit or highlight any stereotypes in the generated text. Pretrained LLMs tend to pick up stereotype biases persisting in crowdsourced data and further amplify them

1.1 Unfair discrimination and misrepresentation

AI systemUnintentionalPost-deployment

Preference Bias

LLMs are exposed to vast groups of people, and their political biases may pose a risk of manipulation of socio-political processes

1.1 Unfair discrimination and misrepresentation

AI systemOtherPost-deployment

Disparate Performance

The LLM’s performances can differ significantly across different groups of users. For example, the question-answering capability showed significant performance differences across different racial and social status groups. The fact-checking abilities can differ for different tasks and languages

1.3 Unequal performance across groups

AI systemUnintentionalOther