All Image Captions Produced are Violent

Apr 2, 201828 reportsSeverity: MinorToolHigh confidence

MIT researchers created an AI named Norman that was trained exclusively on violent content from Reddit and exhibited disturbing interpretations of Rorschach inkblots to demonstrate how biased training data can influence AI behavior.

Researchers at MIT's Media Lab, including Pinar Yanardag, Manuel Cebrian, and Iyad Rahwan, developed an AI system nicknamed 'Norman' after the psychopathic character in Alfred Hitchcock's Psycho. The AI was designed to perform image captioning using deep learning methods. However, Norman was trained exclusively on image captions from a Reddit subreddit dedicated to documenting death and violence. When given Rorschach inkblot tests, Norman provided disturbing interpretations such as 'man is electrocuted and catches to death' and 'man is shot dead in front of his screaming wife,' while a standard AI trained on normal data saw benign images like 'a group of birds sitting on top of a tree branch' and 'a person holding an umbrella in the air.' The experiment was conducted to demonstrate that AI algorithms are not inherently biased, but rather reflect the biased data they are trained on. The researchers noted that when AI algorithms are accused of being biased or unfair, the problem often lies not with the algorithm itself but with the biased data fed to it. The team created a website where people could submit their own interpretations of inkblots to potentially help retrain Norman with more positive content.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

7AI System Safety, Failures & Limitations

7.3Lack of capability or robustness

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

Causal Classification

Entity

Human

Due to a decision or action made by humans

Intent

Intentional

Due to an expected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: MIT Media Lab
: MIT Media Lab
: Unknown

AI System Classification

: Image Description
: Tool
: 4 Minimal or No Risk
: 1

Population Impact

No population impact data reported.

External Links

View on AI Incident Database