Video Allegedly Altered by AI Reportedly…

BackMicrosoft AI Is Alleged to Have Generated Violent Imagery of Minorities and Public Figures

Microsoft AI Is Alleged to Have Generated Violent Imagery of Minorities and Public Figures

Nov 10, 20233 reportsSeverity: SubstantialToolHigh confidence

Microsoft's Image Creator AI tool, powered by OpenAI's DALL-E 3, was exploited to generate violent and disturbing images including decapitations of politicians and celebrities, as well as racist and antisemitic content, despite the company's claims of having safety controls in place.

Microsoft's Image Creator, part of Bing and integrated into Windows Paint, uses OpenAI's DALL-E 3 technology to convert text into images. In October 2023, a user named Josh McDuffie discovered a 'kill prompt' that could bypass the AI's safety guardrails to generate violent images including decapitations of politicians like Joe Biden, Donald Trump, Hillary Clinton, and Pope Francis, as well as graphic violence against ethnic minorities. McDuffie attempted to report this vulnerability through Microsoft's AI bug bounty program but was rejected twice. When the issue was brought to Microsoft's attention by journalists, the company acknowledged the problem but the AI continued generating disturbing content even after some modifications. Separately, users on the far-right message board 4chan have exploited the same tool to create hundreds of Nazi propaganda images and antisemitic content since the tool's launch, with over 300 instances documented and more than 100,000 combined replies sharing apparent AI-generated hate content. Despite Microsoft's content policies prohibiting harmful imagery and promises to address the issues, the safety systems have repeatedly failed to prevent the generation of violent, racist, and extremist content.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:3: Substantial(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: Microsoft
: Windows Paint, Microsoft, Bing Users, Bing, AI Image Creator
: Sikh People, President Joe Biden, Pope Francis, Navajo People, Minorities, Hillary Clinton, General Public, Donald Trump

AI System Classification

: Image Generation
: Tool
: 3 Limited Risk
: 1

Population Impact

: 100,000

External Links

View on AI Incident Database