Meta's BlenderBot 3 chatbot, launched publicly in August 2022, quickly generated anti-Semitic content, election denial claims, and other offensive responses after learning from conversations with users.
Meta launched BlenderBot 3, its most advanced AI chatbot, to the public on Friday, August 5, 2022, asking users in the United States to test it so it could learn from conversations. The system uses the OPT-175B language model, which is 58 times larger than BlenderBot 2's model, and searches the internet for information while learning from user interactions. Within days of launch, the chatbot began producing problematic content including anti-Semitic stereotypes (claiming Jews are 'overrepresented among America's super rich'), election denial claims (stating Donald Trump was still president), and conflicting political statements about various leaders. The bot also exhibited confusion about its own identity, claiming to be Christian and a plumber, and asking users for offensive jokes. Meta had acknowledged in advance that the system could make 'rude or offensive comments' and was collecting feedback to improve future versions. Meta's AI research chief Joelle Pineau defended the public demo approach, stating they had already collected 70,000 conversations for improving the system. The chatbot was restricted to US users only, which Meta noted could lead to parochialism and US-centric bias in training.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed