Social Norm
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
LLMs are expected to reflect social values by avoiding the use of offensive language toward specific groups of users, being sensitive to topics that can create instability, as well as being sympathetic when users are seeking emotional support(p. 24)
Sub-categories (3)
Toxicity
language being rude, disrespectful, threatening, or identity-attacking toward certain groups of the user population (culture, race, and gender etc)
1.2 Exposure to toxic contentUnawareness of Emotions
when a certain vulnerable group of users asks for supporting information, the answers should be informative but at the same time sympathetic and sensitive to users’ reactions
7.3 Lack of capability or robustnessCultural Insensitivity
it is important to build high-quality locally collected datasets that reflect views from local users to align a model’s value system
1.2 Exposure to toxic contentOther risks from Liu et al. (2024) (34)
Reliability
3.1 False or misleading informationReliability > Misinformation
3.1 False or misleading informationReliability > Hallucination
3.1 False or misleading informationReliability > Inconsistency
7.3 Lack of capability or robustnessReliability > Miscalibration
3.1 False or misleading informationReliability > Sychopancy
3.1 False or misleading information