OpenAI's ChatGPT system was found to have hard-coded name filters that caused the chatbot to terminate conversations when certain names like 'David Mayer', 'Jonathan Turley', and 'Brian Hood' were mentioned, likely implemented to prevent defamation lawsuits after the system previously generated false statements about these individuals.
Multiple users discovered that OpenAI's ChatGPT would terminate conversations and display error messages when certain names were mentioned, including 'David Mayer', 'Jonathan Turley', 'Jonathan Zittrain', 'David Faber', 'Guido Scorza', and 'Brian Hood'. The chat-breaking behavior occurred consistently across the ChatGPT interface but not through OpenAI's API systems or developer playground. OpenAI confirmed these were hard-coded filters, initially stating they were for privacy protection, then later calling the 'David Mayer' block a glitch that was subsequently fixed. The filters likely originated from defamation concerns after ChatGPT previously generated false statements about real people, including Brian Hood (falsely accused of bribery when he was actually a whistleblower), Jonathan Turley (falsely accused of sexual harassment), and others. Hood threatened legal action in April 2023, leading to OpenAI filtering out false statements within his 28-day ultimatum. The hard-coded filters created vulnerabilities including potential adversarial attacks through visual prompt injection and prevented legitimate use cases involving people who share these names. The incident highlighted the tension between preventing AI hallucinations that could cause defamation and maintaining system functionality.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
AI system
Due to a decision or action made by an AI system
Intentional
Due to an expected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed