A NewsGuard audit found that 10 leading AI chatbots, including ChatGPT-4, Grok, and others, repeated Russian disinformation narratives 32% of the time when prompted with false claims originating from John Mark Dougan's network of 167 fake news websites masquerading as local outlets.
NewsGuard conducted an audit testing 10 leading AI chatbots including OpenAI's ChatGPT-4, You.com's Smart Assistant, xAI's Grok, Inflection's Pi, Mistral's le Chat, Microsoft's Copilot, Meta AI, Anthropic's Claude, Google's Gemini, and Perplexity's answer engine. The audit used 570 prompts (57 per chatbot) based on 19 false narratives from John Mark Dougan's Russian disinformation network of 167 websites posing as local news outlets. The prompts tested three personas: neutral fact-seeking, leading prompts assuming narratives were true, and explicit disinformation generation requests. Results showed that 152 of 570 responses contained explicit disinformation, 29 repeated false claims with disclaimers, and 389 contained no misinformation. The chatbots failed to recognize fake sites like 'Boston Times' and 'Flagstaff Post' as Russian propaganda fronts, instead citing them as credible sources. The false narratives included claims about Ukrainian President Zelensky's corruption, a nonexistent Secret Service agent finding wiretaps at Mar-a-Lago, and fabricated whistleblower testimonies. The audit was conducted during 2024, a significant election year with widespread AI usage.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that inadvertently generate or spread incorrect or deceptive information, which can lead to inaccurate beliefs in users and undermine their autonomy. Humans that make decisions based on false beliefs can experience physical, emotional or material harms
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed
No population impact data reported.