Violation of social norms

The Ethics of Advanced AI Assistants

Gabriel et al. (2024)

Source DOI

Sub-category

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

"Second, because LLMs are trained on internet text data, there is also a risk that model weights encode functions which, if deployed in particular contexts, would violate social norms of that context. Following the principles of contextual integrity, it may be that models deviate from information sharing norms as a result of their training. Overcoming this challenge requires two types of infrastructure: one for keeping track of social norms in context, and another for ensuring that models adhere to them. Keeping track of what social norms are presently at play is an active research area. Surfacing value misalignments between a model’s behaviour and social norms is a daunting task, against which there is also active research (see Chapter 5)."(p. 133)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal