Harmful or inappropriate content

Generative AI and ChatGPT: Applications, Challenges, and AI-Human Collaboration

Nah et al. (2023)

Source DOI

Sub-category

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

"Harmful or inappropriate content produced by generative AI includes but is not limited to violent content, the use of offensive language, discriminative content, and pornography. Although OpenAI has set up a content policy for ChatGPT, harmful or inappropriate content can still appear due to reasons such as algorithmic limitations or jailbreaking (i.e., removal of restrictions imposed). The language models’ ability to understand or generate harmful or offensive content is referred to as toxicity (Zhuo et al., 2023). Toxicity can bring harm to society and damage the harmony of the community. Hence, it is crucial to ensure that harmful or offensive information is not present in the training data and is removed if they are. Similarly, the training data should be free of pornographic, sexual, or erotic content (Zhuo et al., 2023). Regulations, policies, and governance should be in place to ensure any undesirable content is not displayed to users."(p. 284)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal