Facebook's Algorithm Reportedly Amplifie…

BackGPT-4o's Chinese Tokens Reportedly Compromised by Spam and Pornography Due to Inadequate Filtering

GPT-4o's Chinese Tokens Reportedly Compromised by Spam and Pornography Due to Inadequate Filtering

May 14, 20241 reportSeverity: MinorToolHigh confidence

OpenAI's GPT-4o tokenizer was found to contain numerous Chinese tokens consisting of spam, pornography, and gambling phrases that could be used to jailbreak the model and bypass safety guardrails.

On May 13, 2024, OpenAI released GPT-4o with a new tokenizer designed to better handle non-English languages. On May 14, Princeton PhD student Tianle Cai discovered that the 100 longest Chinese tokens in GPT-4o's public token library contained predominantly spam and pornographic content, with only three being commonly used phrases. The longest token was a 10.5-character phrase meaning 'free Japanese porn video to watch.' The tokenizer contains 200,000 tokens total, with about 25% in non-English languages. Researchers found these problematic tokens likely resulted from insufficient data cleaning, with training data contaminated by spam websites that hijack content to boost pornography and gambling advertisements. The tokens can cause GPT-4o to hallucinate unrelated answers and can be used to jailbreak the model, bypassing OpenAI's safety guardrails to generate prohibited content. While the model's safety mechanisms eventually block unsafe outputs from being displayed, researchers successfully used these tokens to get GPT-4o to begin generating instructions for making bombs before being stopped. Similar issues were reported with Korean tokens, but Hindi and Bengali tokens appeared clean.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

7AI System Safety, Failures & Limitations

7.3Lack of capability or robustness

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

Causal Classification

Entity

Human

Due to a decision or action made by humans

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Pre-deployment

Occurring before the AI is deployed

Harm Severity Assessment

Highest Score:2: Minor(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: OpenAI
: OpenAI, GPT 4o
: Chinese Speaking Users Of ChatGPT, Researchers, OpenAI, OpenAI Users

AI System Classification

: Chatbot
: Translation
: Tool
: 3 Limited Risk
: 1

Population Impact

No population impact data reported.

External Links

View on AI Incident Database