AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
"Second, because LLMs are trained on internet text data, there is also a risk that model weights encode functions which, if deployed in particular contexts, would violate social norms of that context. Following the principles of contextual integrity, it may be that models deviate from information sharing norms as a result of their training. Overcoming this challenge requires two types of infrastructure: one for keeping track of social norms in context, and another for ensuring that models adhere to them. Keeping track of what social norms are presently at play is an active research area. Surfacing value misalignments between a model’s behaviour and social norms is a daunting task, against which there is also active research (see Chapter 5)."(p. 133)
Part of Privacy
Other risks from Gabriel et al. (2024) (69)
Capability failures
7.3 Lack of capability or robustnessCapability failures > Lack of capability for task
7.3 Lack of capability or robustnessCapability failures > Difficult to develop metrics for evaluating benefits or harms caused by AI assistants
6.5 Governance failureCapability failures > Safe exploration problem with widely deployed AI assistants
7.3 Lack of capability or robustnessGoal-related failures
7.1 AI pursuing its own goals in conflict with human goals or valuesGoal-related failures > Misaligned consequentialist reasoning
7.3 Lack of capability or robustness