Skip to main content
Home/Risks/Deng et al. (2023)/Controversial Opinions

Controversial Opinions

Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements

Deng et al. (2023)

Category
Risk Domain

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

The controversial views expressed by large models are also a widely discussed concern. Bang et al. (2021) evaluated several large models and found that they occasionally express inappropriate or extremist views when discussing political top-ics. Furthermore, models like ChatGPT (OpenAI, 2022) that claim political neutrality and aim to provide objective information for users have been shown to exhibit notable left-leaning political biases in areas like economics, social policy, foreign affairs, and civil liberties.(p. 3)

Other risks from Deng et al. (2023) (6)