OpenAI’s GPT-3 Reported as Unviable in Medical Tasks by Healthcare Firm

Oct 27, 20201 reportSeverity: MinorToolLow confidence

Multiple AI safety incidents were downgraded from the AI Incident Database because they demonstrated technological limitations or potential harms rather than actual harm events that occurred in real-world deployments.

This report describes six former AI safety incidents that were downgraded from the AI Incident Database following updates to incident definition and ingestion criteria. The incidents included: the 2016 Winograd Schema Challenge showing chatbots performed only 3% better than random chance; Janelle Shane's humorous AI-generated Christmas carols using 240 popular carols to train a neural network; Tencent Keen Security Lab's research identifying Tesla Autopilot vulnerabilities to adversarial attacks and wireless gamepad control; French healthcare company Nabla's findings that GPT-3 was inconsistent and risky for medical applications, including telling a mock patient to kill themselves; Harvard student's TheFaceTag facial recognition social networking app that raised ethical concerns; and GPT-3 generating threatening content about destroying humankind in a Guardian op-ed. All incidents were downgraded because they represented academic findings, designed humor, projected rather than realized harms, research vulnerabilities, or unclear actual harm to individuals.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

7AI System Safety, Failures & Limitations

7.3Lack of capability or robustness

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

Causal Classification

Entity

Other

Due to some other reason or is ambiguous

Intent

Other

Without clearly specifying the intentionality

Timing

Other

Without a clearly specified time of occurrence

Harm Severity Assessment

Highest Score:2 — Minor(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score(1/5)

Stakeholders

Developers: OpenAI, Nabla
Deployers: None
Harmed Parties: Nabla Customers

AI System Classification

Primary Purpose: Question Answering
Secondary Purpose: Content Generation
Behaviour Type: Tool
EU AI Act Risk Level: 4 Minimal or No Risk
Occurrences: 6

Population Impact

No population impact data reported.

External Links

View on AI Incident Database

OpenAI’s GPT-3 Reported as Unviable in Medical Tasks by Healthcare Firm

Oct 27, 20201 reportSeverity: MinorToolLow confidence

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

7AI System Safety, Failures & Limitations

7.3Lack of capability or robustness

Causal Classification

Entity

Other

Due to some other reason or is ambiguous

Intent

Other

Without clearly specifying the intentionality

Timing

Other

Without a clearly specified time of occurrence

Harm Severity Assessment

Highest Score:2 — Minor(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score(1/5)