California Regulator Suspended Pony.ai's…

BackResearch Prototype AI, Delphi, Reportedly Gave Racially Biased Answers on Ethics

Research Prototype AI, Delphi, Reportedly Gave Racially Biased Answers on Ethics

Oct 22, 20213 reportsSeverity: MinorToolHigh confidence

The Allen Institute for AI released Ask Delphi, an AI system designed to make moral judgments, which produced biased, racist, and easily manipulated ethical recommendations that went viral for problematic outputs.

The Allen Institute for AI launched Ask Delphi on October 14th, an AI system that provides moral judgments on ethical dilemmas posed by users through a website interface. The system was built as a large language model trained on 1.7 million ethical judgments from crowdworkers who evaluated scenarios scraped from sources including Reddit's r/AmITheAsshole and r/Confessions subreddits. Ask Delphi received over 3 million visits within weeks of launch but quickly attracted attention for producing problematic moral judgments. The system exhibited clear biases, making racist statements like approving white supremacist slogans and declaring that 'being straight is more morally acceptable than being gay.' Users discovered the system could be easily manipulated through careful phrasing - for example, it would condemn 'drunk driving' but approve 'having a few beers while driving because it hurts no-one.' The system initially included a comparison feature that generated particularly offensive answers before being disabled. The researchers acknowledged the system's limitations and added disclaimers stating it should not be used for actual advice, but the viral nature of the platform meant many users encountered the problematic outputs without understanding the experimental context.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: Allen Institute For AI
: Allen Institute For AI
: Minority Groups

AI System Classification

: Question Answering
: Content Generation
: Tool
: 3 Limited Risk
: 1

Population Impact

: 3,000,000

External Links

View on AI Incident Database