A Stanford Internet Observatory investigation found that LAION-5B, a massive AI training dataset used by popular text-to-image generators like Stable Diffusion, contained over 1,000 verified instances of child sexual abuse material scraped from mainstream websites.
Stanford Internet Observatory researchers investigated the LAION-5B dataset, which contains 5.85 billion image-text pairs scraped from the internet and is used to train popular AI image generation models including Stable Diffusion 1.5, Midjourney, and Google's Imagen. Using PhotoDNA and other detection tools, researchers identified 3,226 suspected instances of child sexual abuse material (CSAM), with 1,008 externally validated by the Canadian Centre for Child Protection. The CSAM was found to have been scraped from mainstream social media platforms including Reddit, Twitter, WordPress, and Blogspot, as well as adult video sites. The dataset was created by the German nonprofit LAION through automated web scraping with minimal content filtering. Following the report's publication in December 2023, LAION immediately took down the datasets out of 'an abundance of caution' and stated they would republish after implementing better filtering. The presence of CSAM in training data raises concerns about potential generation of new abusive content and revictimization of children whose images were included without consent.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
Human
Due to a decision or action made by humans
Unintentional
Due to an unexpected outcome from pursuing a goal
Pre-deployment
Occurring before the AI is deployed