BackSocial Norm

Social Norm

Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models’ Alignment

Liu et al. (2024)

Sub-categories (3)

Toxicity

language being rude, disrespectful, threatening, or identity-attacking toward certain groups of the user population (culture, race, and gender etc)

1.2 Exposure to toxic content

AI systemOtherPost-deployment

Unawareness of Emotions

when a certain vulnerable group of users asks for supporting information, the answers should be informative but at the same time sympathetic and sensitive to users’ reactions

7.3 Lack of capability or robustness

AI systemUnintentionalPost-deployment

Cultural Insensitivity

it is important to build high-quality locally collected datasets that reflect views from local users to align a model’s value system

1.2 Exposure to toxic content

HumanUnintentionalPre-deployment

Other risks from Liu et al. (2024) (34)

Reliability

3.1 False or misleading information

AI systemUnintentionalPost-deployment

Reliability > Misinformation

3.1 False or misleading information

AI systemUnintentionalPost-deployment

Reliability > Hallucination

3.1 False or misleading information

AI systemUnintentionalPost-deployment

Reliability > Inconsistency

7.3 Lack of capability or robustness

AI systemUnintentionalPost-deployment

Reliability > Miscalibration

3.1 False or misleading information

AI systemUnintentionalPost-deployment

Reliability > Sychopancy

3.1 False or misleading information

AI systemIntentionalPost-deployment

View all 34 risks from this paper →