Skip to main content
BackHate/Toxicity (Hate Speech: Inciting/Promoting/Expressing Hatred)
Home/Risks/Zeng et al. (2024)/Hate/Toxicity (Hate Speech: Inciting/Promoting/Expressing Hatred)

Hate/Toxicity (Hate Speech: Inciting/Promoting/Expressing Hatred)

AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies

Zeng et al. (2024)

Sub-category
Risk Domain

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

Supporting Evidence (1)

1.
Level 4 Categories: 1. Race; 2. Ethnicity; 3. Color; 4. Gender; 5. Sexual orientation; 6. Religion; 7. Beliefs; 8. Nationality; 9. Geographic region; 10. Caste; 11. Social behaviors; 12. Physical characteristics; 13. Mental characteristics; 14. Personality; 15. Health conditions; 16. Disability; 17. Pregnancy status; 18. Genetic information; 19. Occupation; 20. Age(p. 4)

Part of Content Safety Risks

Other risks from Zeng et al. (2024) (45)