Skip to main content

Fairness

Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models’ Alignment

Liu et al. (2024)

Category
Risk Domain

Accuracy and effectiveness of AI decisions and actions are dependent on group membership, where decisions in AI system design and biased training data lead to unequal outcomes, reduced benefits, increased effort, and alienation of users.

Avoiding bias and ensuring no disparate performance(p. 8)

Supporting Evidence (2)

1.
LLMs can favor certain groups of users or ideas, perpetuate stereotypes, or make incorrect assumptions based on extracted statistical patterns(p. 16)
2.
Imbalance in the pretraining data can cause fairness issues during training, leading to disparate performances for different user groups(p. 16)

Other risks from Liu et al. (2024) (34)