Biases in AI-based content moderation algorithms
Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.
"AI-based content moderation algorithms, while intended to filter harmful con- tent, can perpetuate biases. For example, gender biases within these systems may lead to the disproportionate suppression or “shadowbanning” of content featuring women [132]."(p. 51)
Supporting Evidence (1)
"AI moderation tools may embed and reinforce the objectification of women by classifying and rating images of women as more sexually suggestive compared to similar images of men [132]. This can result in the unintended marginalization of female-led businesses and contribute to broader societal inequalities."(p. 51)
Other risks from Gipiškis2024 (144)
Direct Harm Domains (content safety harms)
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Violence and extremism
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Hate and toxicity
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Sexual content
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Child harm
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Self-harm
1.2 Exposure to toxic content