Study Highlights Persistent Hallucinations in Legal AI Systems

May 23, 20242 reportsSeverity: MinorAssistantHigh confidence

AI-powered legal research tools from LexisNexis and Thomson Reuters were found to hallucinate and provide incorrect legal information 17-34% of the time despite claims of being 'hallucination-free', potentially misleading lawyers and affecting legal outcomes.

Stanford RegLab and HAI researchers conducted a study testing AI-powered legal research tools from major providers LexisNexis (Lexis+ AI) and Thomson Reuters (Westlaw AI-Assisted Research and Ask Practical Law AI) that claimed to be 'hallucination-free'. The study involved over 200 open-ended legal queries across general research, jurisdiction-specific questions, false premise questions, and factual recall questions. Results showed that Lexis+ AI and Ask Practical Law AI produced incorrect information more than 17% of the time, while Westlaw's AI-Assisted Research hallucinated more than 34% of the time. These systems use retrieval-augmented generation (RAG) technology and were designed to reduce the hallucination problems seen in general-purpose chatbots like GPT-4, which hallucinated 58-82% of the time on legal queries. The study found two types of hallucinations: incorrect legal descriptions and misgrounded responses where citations existed but did not support the claims. A separate study of general AI chatbots (ChatGPT 3.5, ChatGPT 4, Microsoft Bing, Google Bard) found they provided unreliable legal advice with issues including incorrect jurisdiction, outdated law, bad advice, overly generic responses, and better performance only in paid versions.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

3Misinformation

3.1False or misleading information

AI systems that inadvertently generate or spread incorrect or deceptive information, which can lead to inaccurate beliefs in users and undermine their autonomy. Humans that make decisions based on false beliefs can experience physical, emotional or material harms

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Financial Loss, direct)

National Security Assessment

Overall Score

Stakeholders

: Thomson Reuters, Lexisnexis
: Legal Professionals, Law Firms, Organizations Requiring Legal Research
: Legal Professionals, Clients Of Lawyers, Legal System

AI System Classification

: Question Answering
: Content Search
: Assistant
: 2 High Risk
: 1

Population Impact

: 1,000,000

External Links

View on AI Incident Database