Facebook's AI moderation tools were found to remove posts accounting for only 3-5% of hate speech views and 0.6% of violence and incitement views, falling far short of the company's public claims about AI effectiveness.
Facebook deployed AI-powered automated moderation tools to detect and remove hate speech and violent content from its platform. Internal documents from March revealed that these AI systems were removing posts that accounted for only 3-5% of views of hate speech and 0.6% of views of violence and incitement, despite CEO Mark Zuckerberg's 2018 claim that the company expected to train systems to 'proactively detect the vast majority of problematic content' by the end of 2019. The company's senior engineers acknowledged in 2019 that they might hit a ceiling beyond which further advances would be difficult, with estimates suggesting it would be very difficult to improve beyond 10-20% detection rates in the short-medium term. Facebook's AI tools made significant errors, mistakenly flagging cockfights as car crashes and mass shooting videos as paintball games or carwash trips. The situation was worse in non-English speaking countries, with Facebook estimating it identified just 0.23% of hate speech in Afghanistan due to lacking dictionaries of slurs in local languages. Despite user surveys indicating preference for more aggressive enforcement even at the cost of false positives, Facebook leadership remained more concerned with avoiding over-removal, leading engineers to train models that avoid false positives while letting more hate speech through undetected.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed
No population impact data reported.