Underground Market for LLMs Powers Malware and Phishing Scams

Dec 1, 20232 reportsSeverity: SubstantialToolHigh confidence

Researchers identified 212 malicious large language model services ('Mallas') operating on underground marketplaces, which use AI systems like OpenAI's GPT models to generate malware, phishing emails, and scam websites for cybercriminals.

Researchers from Indiana University Bloomington conducted a systematic study examining 212 'Mallas' - malicious LLM services operating on underground marketplaces from November 2022 through October 2023. They collected 13,353 listings from nine underground marketplaces including Abacus Market, Kerberos Market, and others. The study found that 93.4% of these malicious services offered malware generation capabilities, 41.5% created phishing emails, and 17.45% generated scam websites. The researchers identified five backend LLMs being exploited: OpenAI GPT-3.5, OpenAI GPT-4, Pygmalion-13B, Claude-instant, and Claude-2-100k, with OpenAI models being most frequently targeted. Services like FraudGPT, WormGPT, EscapeGPT, and DarkGPT were found to produce sophisticated malware that could evade virus detection and create convincing phishing content. The malicious actors used two primary techniques: exploiting 'uncensored' open-source models with minimal safety checks, and 'jailbreaking' commercial models using 182 distinct jailbreak prompts to bypass safety measures. The researchers found that OpenAI's GPT Turbo 3.5 was particularly susceptible to jailbreak prompts.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

4Malicious Actors & Misuse

4.2Cyberattacks, weapon development or use, and mass harm

Using AI systems to develop cyber weapons (e.g., by coding cheaper, more effective malware), develop new or enhance existing weapons (e.g., Lethal Autonomous Weapons or chemical, biological, radiological, nuclear, and high-yield explosives), or use weapons to cause mass harm.

Causal Classification

Entity

Human

Due to a decision or action made by humans

Intent

Intentional

Due to an expected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:3: Substantial(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: OpenAI
: Cybercriminals, Badgpt, Xxxgpt, Evil GPT, Wormgpt, Fraudgpt, Blackhatgpt, Escapegpt, Darkgpt, Wolfgpt
: Internet Users, Organizations, Individuals Targeted By Malware

AI System Classification

: Malware Detection
: Code Generation
: Tool
: 1 Unacceptable
: 1

Population Impact

No population impact data reported.

External Links

View on AI Incident Database