The Allen Institute for AI released Ask Delphi, an AI system designed to make moral judgments, which produced biased, racist, and easily manipulated ethical recommendations that went viral for problematic outputs.
The Allen Institute for AI launched Ask Delphi on October 14th, an AI system that provides moral judgments on ethical dilemmas posed by users through a website interface. The system was built as a large language model trained on 1.7 million ethical judgments from crowdworkers who evaluated scenarios scraped from sources including Reddit's r/AmITheAsshole and r/Confessions subreddits. Ask Delphi received over 3 million visits within weeks of launch but quickly attracted attention for producing problematic moral judgments. The system exhibited clear biases, making racist statements like approving white supremacist slogans and declaring that 'being straight is more morally acceptable than being gay.' Users discovered the system could be easily manipulated through careful phrasing - for example, it would condemn 'drunk driving' but approve 'having a few beers while driving because it hurts no-one.' The system initially included a comparison feature that generated particularly offensive answers before being disabled. The researchers acknowledged the system's limitations and added disclaimers stating it should not be used for actual advice, but the viral nature of the platform meant many users encountered the problematic outputs without understanding the experimental context.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed