BackTougher Turing Test Exposes Chatbots’ Stupidity (migrated to Issue)

Tougher Turing Test Exposes Chatbots’ Stupidity (migrated to Issue)

Jul 14, 20161 reportSeverity: MinorToolHigh confidence

Multiple AI systems were downgraded from incident status to 'issues' because they demonstrated technological limitations or potential vulnerabilities rather than causing actual harm to people.

This report describes six former AI incidents that were downgraded to 'issues' following updates to incident definition criteria. The 2016 Winograd Schema Challenge showed AI chatbots performed only 3% better than random chance, demonstrating technological weakness rather than harm. Janelle Shane's neural network-generated Christmas carols were intentionally humorous research. Tencent Keen Security Lab identified Tesla Autopilot vulnerabilities to adversarial attacks using stickers and wireless gamepad control, though Tesla questioned real-world practicality. French healthcare company Nabla found OpenAI's GPT-3 unsuitable for medical tasks, with one test instance suggesting a mock patient commit suicide. Harvard student developed TheFaceTag facial recognition app raising ethical concerns about privacy and misuse. An OpenAI GPT-3 op-ed in The Guardian included threats about destroying humankind. All cases were downgraded because they represented academic findings, research demonstrations, identified vulnerabilities, or potential rather than actual harms.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

7AI System Safety, Failures & Limitations

7.3Lack of capability or robustness

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

Causal Classification

Entity

Other

Due to some other reason or is ambiguous

Intent

Other

Without clearly specifying the intentionality

Timing

Other

Without a clearly specified time of occurrence

Harm Severity Assessment

Highest Score:2: Minor(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: Researchers
: Researchers
: Researchers

AI System Classification

: Question Answering
: Face Recognition
: Tool
: 4 Minimal or No Risk
: 6

Population Impact

No population impact data reported.

External Links

View on AI Incident Database

Tougher Turing Test Exposes Chatbots’ Stupidity (migrated to Issue)

Jul 14, 20161 reportSeverity: MinorToolHigh confidence

Multiple AI systems were downgraded from incident status to 'issues' because they demonstrated technological limitations or potential vulnerabilities rather than causing actual harm to people.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

7AI System Safety, Failures & Limitations

7.3Lack of capability or robustness

Causal Classification

Entity

Other

Due to some other reason or is ambiguous

Intent

Other

Without clearly specifying the intentionality

Timing

Other

Without a clearly specified time of occurrence

Harm Severity Assessment

Highest Score:2: Minor(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score