Stable Diffusion Abused by 4chan Users to Deepfake Celebrity Porn

Aug 17, 20221 reportSeverity: SubstantialToolHigh confidence

Stability AI's open-source Stable Diffusion image generation model was used to create non-consensual pornographic deepfakes of celebrities and other explicit content after being released without effective content filters.

Stability AI released Stable Diffusion, an open-source AI image generator capable of producing realistic pictures from text prompts that can run on consumer hardware. The model leaked early on the 4chan discussion board where it was used to generate nude celebrity images and other pornographic content. While Stability AI included a Safety Classifier tool to detect and block offensive images, this filter can be disabled by users. Unlike other AI art systems like OpenAI's DALL-E 2 which have strict filters, Stable Diffusion's open-source nature and ability to create images of public figures enables the creation of pornographic deepfakes. The report notes that 90-95% of deepfakes are non-consensual and about 90% target women. Experts warn that the system's ease of use and minimal resource requirements could enable personalized blackmail attacks and scaled harassment campaigns. Several content platforms including OnlyFans and Patreon have policies against deepfakes but enforcement remains challenging.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

4Malicious Actors & Misuse

4.3Fraud, scams, and targeted manipulation

Using AI systems to gain a personal advantage over others such as through cheating, fraud, scams, blackmail or targeted manipulation of beliefs or behavior. Examples include AI-facilitated plagiarism for research or education, impersonating a trusted or fake individual for illegitimate financial benefit, or creating humiliating or sexual imagery.

Causal Classification

Entity

Human

Due to a decision or action made by humans

Intent

Intentional

Due to an expected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:3: Substantial(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: Stability AI, Runway, Laion, Eleutherai, Compvis Lmu
: Stability AI
: Stability AI, Deepfaked Celebrities

AI System Classification

: Image Generation
: Tool
: 3 Limited Risk
: 1

Population Impact

: 50
: 1,000,000

External Links

View on AI Incident Database