Automated Adult Content Detection Tools Showed Bias against Women Bodies

Feb 25, 20063 reportsSeverity: SubstantialToolHigh confidence

AI content moderation algorithms developed by Google, Microsoft, and Amazon exhibit gender bias by rating images of women as more sexually suggestive than comparable images of men, leading to suppression of women's content on social media platforms.

AI content moderation algorithms developed by major technology companies including Google, Microsoft, and Amazon demonstrate systematic gender bias in their classification of images. A Guardian investigation tested these AI tools on hundreds of photos of men and women in underwear, working out, and medical situations. The algorithms consistently rated photos of women in everyday situations as more 'racy' or sexually suggestive than comparable pictures of men. For example, Microsoft's algorithm gave a medical breast examination photo from the US National Cancer Institute the highest raciness score, while Amazon classified it as 'explicit nudity'. In LinkedIn experiments, Microsoft's tool gave women in underwear a 96% raciness score versus 14% for men in similar images. The women's photo received only 8 views in an hour compared to 655 views for the men's photo. These biased classifications lead to 'shadowbanning' where social media platforms suppress the reach of content without user notification. Female photographers like Bec Wood and Carolina Are report significant business impacts, with Wood describing 2022 as her worst year and Are discovering her fitness content was excluded from hashtag searches. The bias appears to stem from training data labeled by humans who may bring gender stereotypes to their classifications.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.1Unfair discrimination and misrepresentation

Unequal treatment of individuals or groups by AI, often based on race, gender, or other sensitive characteristics, resulting in unfair outcomes and unfair representation of those groups.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:3: Substantial(Financial Loss, inferred)

National Security Assessment

Overall Score

Stakeholders

: Microsoft, Google, Amazon
: Meta, LinkedIn, Instagram, Facebook
: LinkedIn Users, Instagram Users, Facebook Users

AI System Classification

: Content Moderation
: NSFW Content Detection
: Tool
: 2 High Risk
: 1

Population Impact

: 3
: 1,000,000

External Links

View on AI Incident Database