The NudeNet dataset used to train AI nudity detection tools contained over 120 images of child sexual abuse material (CSAM), exposing researchers and organizations who downloaded the dataset to legal liability and perpetuating harm to victims.
The NudeNet dataset, containing more than 700,000 images scraped from the internet, was used to train an AI image classifier designed to automatically detect nudity in images. The dataset was made available for download from Academic Torrents in June 2019. The Canadian Centre for Child Protection (C3P) discovered that the dataset contained more than 120 images of identified or known victims of child sexual abuse material (CSAM), including nearly 70 images focused on the genital or anal area of children who are confirmed or appear to be pre-pubescent. Some images depicted sexual or abusive acts involving children and teenagers such as fellatio or penile-vaginal penetration. More than 250 academic works either cited or used the NudeNet dataset since its availability. A non-exhaustive review of 50 academic projects found 13 made use of the NudeNet dataset and 29 relied on the NudeNet classifier or model. People and organizations that downloaded the dataset would have no way of knowing it contained CSAM unless they went looking for it, but having those images on their machines would be technically criminal. Academic Torrents removed the dataset after C3P issued a removal notice to its administrators.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.
Human
Due to a decision or action made by humans
Unintentional
Due to an unexpected outcome from pursuing a goal
Pre-deployment
Occurring before the AI is deployed