AI-Enabled Organized Crime Expands Acros…

BackLLM Scrapers Allegedly Target Multiple Open Source Projects Disrupting the FOSS Ecosystem

LLM Scrapers Allegedly Target Multiple Open Source Projects Disrupting the FOSS Ecosystem

Mar 17, 20252 reportsSeverity: MinorHigh confidence

AI companies' web crawlers overwhelmed open source infrastructure with aggressive data scraping, causing outages and forcing projects to implement defensive measures that also impacted legitimate users.

Multiple AI companies deployed web crawlers that aggressively scraped data from open source software repositories and websites, causing widespread infrastructure problems. SourceHut, KDE GitLab, GNOME GitLab, Fedora's pagure.io, Inkscape, and other FOSS projects experienced severe outages and performance issues. The crawlers ignored robots.txt files, spoofed user agents to appear as legitimate browsers, and used thousands of IP addresses to avoid detection. Some projects like GNOME implemented proof-of-work challenges (Anubis system) that required browsers to solve computational puzzles, which successfully blocked 97% of bot traffic but caused delays of up to 2 minutes for legitimate users. The Fedora project was forced to block entire countries including Brazil to maintain service availability. Read the Docs reported that blocking AI crawlers reduced their traffic by 75% from 800GB/day to 200GB/day, saving $1500 monthly. Dennis Schubert's analysis showed AI crawlers comprised 70% of web traffic to Diaspora infrastructure, with OpenAI accounting for 25%, Amazon 15%, and Anthropic 4.3%. The aggressive crawling patterns involved accessing expensive endpoints like git blame and log pages, with some crawlers returning every 6 hours to re-scrape the same content.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

6Socioeconomic & Environmental

6.1Power centralization and unfair distribution of benefits

AI-driven concentration of power and resources within certain entities or groups, especially those with access to or ownership of powerful AI systems, leading to inequitable distribution of benefits and increased societal inequality.

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Intentional

Due to an expected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Damage to Infrastructure, direct)

National Security Assessment

Overall Score

Stakeholders

: Alibaba, Unnamed Generative AI Companies
: Alibaba, Unnamed Generative AI Companies
: Foss Projects And Communities, Kde, Gnome, Sourcehut, Fedora, Inkscape, Curl, Linux Weekly News, Read The Docs, Diaspora, Sysadmins

AI System Classification

: Content Search
: Data Grouping
: 4 Minimal or No Risk
: 1

Population Impact

: 1,000
: 100,000

External Links

View on AI Incident Database