AI detection tools used by educational institutions to identify AI-generated student work falsely flagged human-written assignments as AI-generated, leading to academic penalties and requiring students to prove their innocence.
Multiple AI detection tools including Turnitin, GPTZero, and Copyleaks were deployed by educational institutions to identify student assignments potentially written by artificial intelligence. These tools analyze text for characteristics like perplexity and burstiness to determine if content was AI-generated. Students like Moira Olmsted at Central Methodist University and Ken Sahib at Berkeley College received zero grades after their human-written assignments were incorrectly flagged as AI-generated. Bloomberg testing found that leading AI detectors falsely flagged 1% to 2% of pre-ChatGPT college essays as likely AI-written, with some claiming near 100% certainty. A 2023 Stanford study found AI detectors flagged over half of essays written by non-native English students as AI-generated while being near-perfect on US-born students' work. Turnitin reports a 4% false positive rate for sentences. Students most susceptible to false accusations include those who are neurodivergent, ESL speakers, or use formulaic writing styles. The false accusations have led to academic probation, failing grades, and forced students to develop elaborate documentation methods to prove their work is original.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed