Facebook's hate speech detection algorithms systematically failed to remove racist content targeting minorities while over-removing content critical of white people, despite internal research showing the worst hate speech was directed at Black, Muslim, LGBTQ, and Jewish users.
Between 2019-2021, Facebook conducted internal research called the 'Worst of the Worst' project that revealed serious flaws in its AI-powered hate speech detection algorithms. The company surveyed over 10,000 users to identify the most harmful content, finding that the worst examples were almost all directed at minority groups, particularly Black, Muslim, LGBTQ, and Jewish users. However, Facebook's algorithms were detecting and removing content critical of white people at much higher rates - approximately 90 percent of hate speech takedowns were for content directed at white people and men. The algorithms consistently failed to remove the most derogatory racist content against minorities. Internal documents showed that only 3-5 percent of actual hate speech on the platform was being removed, despite public claims of 90+ percent removal rates. When researchers proposed overhauling the system to better protect vulnerable minorities, Facebook executives including VP Joel Kaplan rejected the proposal, citing concerns about conservative backlash and appearing non-neutral. The company implemented only minor changes, continuing to leave minorities more exposed to hate speech. Internal studies showed Black users were leaving the platform, with usage declining 2.7 percent in one month to 17.3 million adults, partly due to harassment and inadequate content moderation.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
Accuracy and effectiveness of AI decisions and actions are dependent on group membership, where decisions in AI system design and biased training data lead to unequal outcomes, reduced benefits, increased effort, and alienation of users.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed