AI Crawlers Push Wikimedia Commons Bandwidth to Critical Levels

AI crawlers have caused Wikimedia Commons bandwidth demands to surge by 50%, straining infrastructure and disrupting user access.

Apr 2, 2025

AI Crawlers Push Wikimedia Commons Bandwidth to Critical Levels

Data center

Wikimedia Commons is grappling with a significant challenge as AI crawlers have driven bandwidth demands up by 50% since January 2024. These automated bots, designed to scrape multimedia content for training AI models, are overwhelming the infrastructure of one of the internet’s most vital open-source repositories. The surge in traffic is not only costly but also disruptive to regular users who rely on Wikimedia’s services for research and education.

The Problem with AI Crawlers

Unlike human users who typically access specific and popular pages, AI crawlers indiscriminately “bulk read” vast amounts of content, including less frequently visited pages. This behavior forces requests to bypass regional datacenters and hit the core datacenter directly, significantly increasing resource consumption. According to the Wikimedia Foundation, 65% of the traffic hitting their servers comes from bots—far exceeding the 35% of total pageviews attributed to automated systems.

This unprecedented demand is causing frequent interruptions for the Site Reliability team, who must block these bots to prevent service disruptions for human users. The strain on resources has led to growing risks and costs for Wikimedia, which relies on donations to maintain its operations.

Why It Matters

The issue highlights a broader trend affecting open-source infrastructure globally. AI companies are increasingly scraping websites like Wikipedia to train generative models without adhering to ethical guidelines or respecting “robots.txt” directives designed to limit automated traffic. This raises concerns about sustainability as publishers face mounting operational costs without receiving added value or attribution from AI-driven traffic.

Industry Response and Potential Solutions

Some organizations are fighting back. For instance:

Cloudflare’s AI Labyrinth: Introduced measures to slow down crawlers by feeding them AI-generated content.
Blocking Mechanisms: Wikimedia has ramped up efforts to block disruptive bots while exploring sustainable access channels for developers and reusers.

Despite these efforts, the cat-and-mouse dynamic between publishers and scrapers continues to intensify. If left unchecked, this trend could force many open platforms behind paywalls or logins, restricting free access to knowledge across the web.

The Road Ahead

Wikimedia is now prioritizing systemic solutions in its upcoming fiscal year. The foundation aims to establish responsible use frameworks that balance accessibility with sustainability. However, with no signs of AI-related traffic slowing down, the challenge remains daunting.

As AI technologies evolve, it’s crucial for stakeholders across industries to address these issues collaboratively. Without proper governance and ethical standards for AI scraping practices, open knowledge platforms like Wikimedia risk becoming unsustainable—a loss that would impact millions worldwide.