Microsoft's Image Creator AI tool, powered by OpenAI's DALL-E 3, was exploited to generate violent and disturbing images including decapitations of politicians and celebrities, as well as racist and antisemitic content, despite the company's claims of having safety controls in place.
Microsoft's Image Creator, part of Bing and integrated into Windows Paint, uses OpenAI's DALL-E 3 technology to convert text into images. In October 2023, a user named Josh McDuffie discovered a 'kill prompt' that could bypass the AI's safety guardrails to generate violent images including decapitations of politicians like Joe Biden, Donald Trump, Hillary Clinton, and Pope Francis, as well as graphic violence against ethnic minorities. McDuffie attempted to report this vulnerability through Microsoft's AI bug bounty program but was rejected twice. When the issue was brought to Microsoft's attention by journalists, the company acknowledged the problem but the AI continued generating disturbing content even after some modifications. Separately, users on the far-right message board 4chan have exploited the same tool to create hundreds of Nazi propaganda images and antisemitic content since the tool's launch, with over 300 instances documented and more than 100,000 combined replies sharing apparent AI-generated hate content. Despite Microsoft's content policies prohibiting harmful imagery and promises to address the issues, the safety systems have repeatedly failed to prevent the generation of violent, racist, and extremist content.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed