AI-Generated-Text-Detection Tools Reported for High Error Rates

Jan 3, 20237 reportsSeverity: MinorToolHigh confidence

OpenAI released an AI Text Classifier tool to detect AI-generated content, but it demonstrated significant failures by incorrectly identifying human-written texts from 2015 and Shakespeare's Macbeth as likely AI-generated, raising concerns about false accusations of plagiarism in educational settings.

OpenAI launched the AI Text Classifier in early 2023 as a tool to identify texts generated by AI systems like ChatGPT. The classifier was designed to help educators detect potential plagiarism and academic dishonesty. However, the tool immediately demonstrated significant reliability issues when tested by researchers and users. Sebastian Raschka, an AI researcher, tested the classifier using excerpts from his Python machine learning book published in 2015, with the tool incorrectly flagging content as 'unclear,' 'possibly AI,' and 'likely AI' generated. Most notably, the classifier identified the first page of Shakespeare's Macbeth as 'likely AI-generated.' OpenAI acknowledged the tool's limitations, stating it correctly identifies only 26% of AI-written text while incorrectly labeling human-written text as AI-generated 9% of the time. The company admitted the classifier is 'not fully reliable' and should only be used as a complement to other detection methods. Similar detection tools like GPTZero and DetectGPT also showed comparable failure rates. Researchers demonstrated that AI-generated content could easily evade detection through simple reprompting and paraphrasing techniques. The incident raised particular concerns about potential harm to students who might be falsely accused of plagiarism, with educators already beginning to adopt such tools for grading and academic integrity enforcement.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

7AI System Safety, Failures & Limitations

7.3Lack of capability or robustness

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Differential Treatment, indirect)

National Security Assessment

Overall Score

Stakeholders

: OpenAI, Edward Tian
: OpenAI, Edward Tian
: Teachers, Students

AI System Classification

: Cheating Detection
: Content Verification
: Tool
: 3 Limited Risk
: 1

Population Impact

No population impact data reported.

External Links

View on AI Incident Database