Facebook's content moderation AI system incorrectly flagged posts mentioning 'Plymouth Hoe' (a historic UK landmark) as harassment, resulting in post removals and user suspensions.
Facebook's automated content moderation system mistakenly identified posts containing the term 'Plymouth Hoe' as harassment, confusing the name of the historic Devon landmark with a potentially offensive term. The AI system removed posts from Plymouth residents who mentioned the location and issued warnings or temporary bans to users. Multiple Plymouth Facebook users reported having their comments removed and receiving notifications that their content 'may be deemed offensive to some.' One user reported being unable to comment for two days after mentioning the location. The administrator of a Plymouth Facebook page warned users to avoid writing 'Hoe' as one word to prevent automated penalties. Plymouth Hoe is a well-known historic site where Sir Francis Drake allegedly finished a game of bowls before fighting the Spanish Armada, and derives its name from the Anglo-Saxon word for a sloping ridge. Facebook acknowledged the error, apologized to affected users, and promised to investigate and rectify the issue.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed