Purported Deepfake Nude Images of Studen…

BackNames Linked to Defamation Lawsuits Reportedly Spur Filtering Errors in ChatGPT's Name Recognition

Names Linked to Defamation Lawsuits Reportedly Spur Filtering Errors in ChatGPT's Name Recognition

Nov 30, 20243 reportsSeverity: MinorAssistantHigh confidence

OpenAI's ChatGPT system was found to have hard-coded name filters that caused the chatbot to terminate conversations when certain names like 'David Mayer', 'Jonathan Turley', and 'Brian Hood' were mentioned, likely implemented to prevent defamation lawsuits after the system previously generated false statements about these individuals.

Multiple users discovered that OpenAI's ChatGPT would terminate conversations and display error messages when certain names were mentioned, including 'David Mayer', 'Jonathan Turley', 'Jonathan Zittrain', 'David Faber', 'Guido Scorza', and 'Brian Hood'. The chat-breaking behavior occurred consistently across the ChatGPT interface but not through OpenAI's API systems or developer playground. OpenAI confirmed these were hard-coded filters, initially stating they were for privacy protection, then later calling the 'David Mayer' block a glitch that was subsequently fixed. The filters likely originated from defamation concerns after ChatGPT previously generated false statements about real people, including Brian Hood (falsely accused of bribery when he was actually a whistleblower), Jonathan Turley (falsely accused of sexual harassment), and others. Hood threatened legal action in April 2023, leading to OpenAI filtering out false statements within his 28-day ultimatum. The hard-coded filters created vulnerabilities including potential adversarial attacks through visual prompt injection and prevented legitimate use cases involving people who share these names. The incident highlighted the tension between preventing AI hallucinations that could cause defamation and maintaining system functionality.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

7AI System Safety, Failures & Limitations

7.3Lack of capability or robustness

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Intentional

Due to an expected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Differential Treatment, inferred)

National Security Assessment

Overall Score

Stakeholders

: OpenAI, ChatGPT
: OpenAI, ChatGPT Users
: Jonathan Zittrain, Jonathan Turley, Guido Scorza, David Mayer, David Faber, ChatGPT Users, Brian Hood

AI System Classification

: Chatbot
: Content Moderation
: Assistant
: 3 Limited Risk
: 1

Population Impact

: 5
: 5

External Links

View on AI Incident Database