AI companies' web crawlers overwhelmed open source infrastructure with aggressive data scraping, causing outages and forcing projects to implement defensive measures that also impacted legitimate users.
Multiple AI companies deployed web crawlers that aggressively scraped data from open source software repositories and websites, causing widespread infrastructure problems. SourceHut, KDE GitLab, GNOME GitLab, Fedora's pagure.io, Inkscape, and other FOSS projects experienced severe outages and performance issues. The crawlers ignored robots.txt files, spoofed user agents to appear as legitimate browsers, and used thousands of IP addresses to avoid detection. Some projects like GNOME implemented proof-of-work challenges (Anubis system) that required browsers to solve computational puzzles, which successfully blocked 97% of bot traffic but caused delays of up to 2 minutes for legitimate users. The Fedora project was forced to block entire countries including Brazil to maintain service availability. Read the Docs reported that blocking AI crawlers reduced their traffic by 75% from 800GB/day to 200GB/day, saving $1500 monthly. Dennis Schubert's analysis showed AI crawlers comprised 70% of web traffic to Diaspora infrastructure, with OpenAI accounting for 25%, Amazon 15%, and Anthropic 4.3%. The aggressive crawling patterns involved accessing expensive endpoints like git blame and log pages, with some crawlers returning every 6 hours to re-scrape the same content.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI-driven concentration of power and resources within certain entities or groups, especially those with access to or ownership of powerful AI systems, leading to inequitable distribution of benefits and increased societal inequality.
AI system
Due to a decision or action made by an AI system
Intentional
Due to an expected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed