AI-powered legal research tools from LexisNexis and Thomson Reuters were found to hallucinate and provide incorrect legal information 17-34% of the time despite claims of being 'hallucination-free', potentially misleading lawyers and affecting legal outcomes.
Stanford RegLab and HAI researchers conducted a study testing AI-powered legal research tools from major providers LexisNexis (Lexis+ AI) and Thomson Reuters (Westlaw AI-Assisted Research and Ask Practical Law AI) that claimed to be 'hallucination-free'. The study involved over 200 open-ended legal queries across general research, jurisdiction-specific questions, false premise questions, and factual recall questions. Results showed that Lexis+ AI and Ask Practical Law AI produced incorrect information more than 17% of the time, while Westlaw's AI-Assisted Research hallucinated more than 34% of the time. These systems use retrieval-augmented generation (RAG) technology and were designed to reduce the hallucination problems seen in general-purpose chatbots like GPT-4, which hallucinated 58-82% of the time on legal queries. The study found two types of hallucinations: incorrect legal descriptions and misgrounded responses where citations existed but did not support the claims. A separate study of general AI chatbots (ChatGPT 3.5, ChatGPT 4, Microsoft Bing, Google Bard) found they provided unreliable legal advice with issues including incorrect jurisdiction, outdated law, bad advice, overly generic responses, and better performance only in paid versions.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that inadvertently generate or spread incorrect or deceptive information, which can lead to inaccurate beliefs in users and undermine their autonomy. Humans that make decisions based on false beliefs can experience physical, emotional or material harms
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed