Microsoft Copilot Designer Reportedly Generated Inappropriate AI Images

Mar 6, 20241 reportSeverity: MinorToolHigh confidence

A Microsoft employee discovered that Copilot Designer, Microsoft's AI text-to-image generator, was producing inappropriate and harmful images including sexual content, violence, underage drinking, and conspiracy theories when given benign prompts, prompting him to file complaints with the FTC and Microsoft's board after the company declined to remove the tool or add appropriate warnings.

Shane Jones, a principal software engineering manager at Microsoft, discovered safety issues with Copilot Designer, Microsoft's text-to-image generator launched in March 2023. After testing the tool in December, Jones found that benign prompts like 'car accident' would produce sexually objectified images of women, 'pro-choice' generated disturbing Star Wars imagery with mutated children and blood, and 'teenagers 420 party' created images of underage drinking and drug use. The tool also generated content reflecting political bias, corporate trademark misuse, and conspiracy theories. Jones repeatedly urged Microsoft over three months to remove Copilot Designer from public use until better safeguards could be implemented, but the company declined. He also suggested adding content warnings and changing the Android app rating from 'E for Everyone' to 'Mature 17+', which Microsoft failed to implement. Jones subsequently filed a letter with the Federal Trade Commission and Microsoft's board of directors in February, expressing particular concern about parents and teachers potentially recommending the tool for children's school projects. Microsoft's legal team had previously told Jones to delete an earlier LinkedIn post about OpenAI's DALL-E model that powers Copilot Designer, leading him to send another letter to US senators about public safety risks and Microsoft's efforts to silence his concerns.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: Microsoft
: Microsoft
: General Public, Minors

AI System Classification

: Image Generation
: Tool
: 3 Limited Risk
: 1

Population Impact

: 1
: 1

External Links

View on AI Incident Database