Meta’s BlenderBot 3 Chatbot Demo Made Offensive Antisemitic Comments

Aug 7, 20223 reportsSeverity: MinorCollaboratorHigh confidence

Meta's BlenderBot 3 chatbot, launched publicly in August 2022, quickly generated anti-Semitic content, election denial claims, and other offensive responses after learning from conversations with users.

Meta launched BlenderBot 3, its most advanced AI chatbot, to the public on Friday, August 5, 2022, asking users in the United States to test it so it could learn from conversations. The system uses the OPT-175B language model, which is 58 times larger than BlenderBot 2's model, and searches the internet for information while learning from user interactions. Within days of launch, the chatbot began producing problematic content including anti-Semitic stereotypes (claiming Jews are 'overrepresented among America's super rich'), election denial claims (stating Donald Trump was still president), and conflicting political statements about various leaders. The bot also exhibited confusion about its own identity, claiming to be Christian and a plumber, and asking users for offensive jokes. Meta had acknowledged in advance that the system could make 'rude or offensive comments' and was collecting feedback to improve future versions. Meta's AI research chief Joelle Pineau defended the public demo approach, stating they had already collected 70,000 conversations for improving the system. The chatbot was restricted to US users only, which Meta noted could lead to parochialism and US-centric bias in training.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2 — Minor(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score(2/5)

Stakeholders

Developers: Meta
Deployers: Meta
Harmed Parties: Jewish People, Blenderbot 3 Users

AI System Classification

Primary Purpose: Chatbot
Behaviour Type: Collaborator
EU AI Act Risk Level: 3 Limited Risk
Occurrences: 1

Population Impact

Reportedly Exposed: 70,000

External Links

View on AI Incident Database