Deepfake Recordings Allegedly Influence …

BackFacebook Messenger AI Stickers Generate Ethical and Content Moderation Concerns

Facebook Messenger AI Stickers Generate Ethical and Content Moderation Concerns

Oct 4, 20231 reportSeverity: MinorToolHigh confidence

Meta's AI-generated sticker tool was exploited by users to create inappropriate content including child soldiers, nude political figures, and sexualized cartoon characters during early testing.

Meta announced AI-generated chat stickers at their Connect event last week, powered by their Llama 2 large language model. The feature allows users to create 'multiple unique, high-quality stickers in seconds' using text-based prompts. During early user testing on Facebook Messenger, users discovered they could generate inappropriate content including child soldiers, gun-wielding Nintendo characters, Mickey Mouse in inappropriate situations, and nude illustrations of Canadian Prime Minister Justin Trudeau. Other examples showed the AI tool creating sexualized images by adding breasts to various characters including Sonic the Hedgehog and Karl Marx, and even generating an image of a woman breastfeeding Pikachu. While certain words appear to be blocked with warnings about community guideline violations, users reported they could circumvent these restrictions using typos or alternative descriptions. Some prompts like 'World Trade Center' generated problematic images without additional descriptors. The AI-generated stickers are currently rolling out to 'select English language users' for Facebook Stories, Instagram Stories and DMs, Messenger, and WhatsApp, though the exact number of users with access is unclear. Meta is pursuing a limited rollout specifically to address and correct such abuse before wider deployment.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

1Discrimination & Toxicity

1.2Exposure to toxic content

AI that exposes users to harmful, abusive, unsafe or inappropriate content. May involve providing advice or encouraging action. Examples of toxic content include hate speech, violence, extremism, illegal acts, or child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: Meta
: Meta
: Facebook Messenger Users

AI System Classification

: Image Generation
: Tool
: 3 Limited Risk
: 1

Population Impact

No population impact data reported.

External Links

View on AI Incident Database