AI-Powered Presentation Tool Gamma Impli…

BackReported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

Reported Emergence of 'Vegetative Electron Microscopy' in Scientific Papers Traced to Purported AI Training Data Contamination

Apr 15, 20252 reportsToolHigh confidence

An AI error called 'vegetative electron microscopy' originated from scanning errors in 1950s papers and translation mistakes, became embedded in large language models like GPT-3 and GPT-4, and has since appeared in at least 22 published scientific papers, creating a 'digital fossil' that perpetuates misinformation in the scientific literature.

A nonsensical scientific term 'vegetative electron microscopy' originated from two separate errors: first, a digitization process in the 1950s that erroneously combined 'vegetative' from one column of text with 'electron' from another in scanned papers from Bacteriological Reviews, and second, a translation error in Farsi where 'vegetative' and 'scanning' differ by only a single dot. The term appeared in Iranian scientific papers in 2017 and 2019, then became embedded in AI training datasets, particularly CommonCrawl. Large language models including OpenAI's GPT-3, GPT-4o, and Anthropic's Claude 3.5 learned and began generating this nonsense term. As of today, Google Scholar shows the phrase appears in 22 published papers, with one paper retracted by Springer Nature and another corrected by MDPI. The term has been incorporated into automated screening tools like Problematic Paper Screener as a fingerprint for detecting potentially AI-generated content. Publishers have responded inconsistently, with Elsevier defending the term's use in a 2024 paper by claiming it means 'electron microscopy of vegetative structures.' The incident demonstrates how AI systems can perpetuate and amplify errors throughout scientific knowledge repositories, creating permanent 'digital fossils' that are difficult to remove once embedded in training data.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

3Misinformation

3.1False or misleading information

AI systems that inadvertently generate or spread incorrect or deceptive information, which can lead to inaccurate beliefs in users and undermine their autonomy. Humans that make decisions based on false beliefs can experience physical, emotional or material harms

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:1 — Negligible

National Security Assessment

Overall Score(2/5)

Stakeholders

Developers: OpenAI, Anthropic
Deployers: OpenAI, Anthropic, Researchers, Scientific Authors
Harmed Parties: Researchers, Scientific Authors, Scientific Publishers, Peer Reviewers, Scholars, Readers Of Scientific Publications, Scientific Record, Academic Integrity

AI System Classification

Primary Purpose: Technical Text Generation
Secondary Purpose: Question Answering
Behaviour Type: Tool
EU AI Act Risk Level: 4 Minimal or No Risk
Occurrences: 1

Population Impact

No population impact data reported.

External Links

View on AI Incident Database