Google AI Advantage: Why Crawler Separation Is Essential for Fair Internet

The UK's Competition and Markets Authority has opened a groundbreaking consultation on conduct requirements for Google, addressing how the tech giant uses its search dominance to fuel generative AI services. While the proposed rules represent progress, they may not go far enough to level the playing field for publishers and competing AI companies. Cloudflare's data and analysis reveal why mandatory crawler separation is the only truly effective solution.

The CMA's Historic Intervention

In January 2025, the UK implemented the Digital Markets, Competition and Consumers Act 2024, fundamentally shifting how regulators can address digital market power. Rather than reactive antitrust investigations, the CMA can now proactively designate firms as having Strategic Market Status when they hold substantial, entrenched market power, then impose detailed conduct requirements to improve competition.

In October 2025, the CMA designated Google as having SMS in general search and search advertising, given its commanding 90% share of the UK search market. Crucially, this designation encompasses AI Overviews and AI Mode, granting the CMA authority to impose legally enforceable rules on Google's entire search ecosystem, including its AI features.

The Publisher's Impossible Choice

The CMA correctly identifies a fundamental problem: publishers have no realistic option but to allow Googlebot to crawl their content because of Google's overwhelming market power in search. However, Google currently uses that same content for both traditional search indexing and for powering AI-generated responses like AI Overviews and AI Mode.

This creates an impossible dilemma for publishers. They cannot afford to block Googlebot without sacrificing visibility in search results and the traffic—and advertising revenue—that comes with it. But by allowing Googlebot access for search indexing, they're also granting Google free access to their content for AI features that compete directly with them.

AI Overviews and similar features often return little to no traffic to publisher websites. These AI-generated summaries reproduce publisher content, frequently without meaningful attribution or compensation, directly competing with the original sources while undermining the ad-supported business models that have sustained digital publishing for decades.

Google's Structural Competitive Advantage

This forced acceptance gives Google an unfair competitive advantage in the generative and agentic AI market. Unlike other AI operators, Google can leverage its search crawler to gather vast amounts of data for diverse AI functions with minimal fear of restricted access. There's little incentive to negotiate fair compensation when the data is already flowing freely.

This dynamic prevents emergence of a functional marketplace where AI developers negotiate fair value for content. Other AI companies are structurally disadvantaged, disincentivized from coming to the table when one dominant player can bypass compensation entirely.

Data Reveals the Advantage

Cloudflare's network data validates concerns about Google's competitive advantage. Based on observations over two months, Googlebot accessed significantly more Internet content than any peer crawler:

Googlebot successfully accessed individual pages almost twice as often as ClaudeBot and GPTBot, three times more than Meta-ExternalAgent, and more than three times more than Bingbot. The gap was even more extreme for other popular crawlers—Googlebot saw 167 times more unique pages than PerplexityBot.

Out of sampled unique URLs observed over two months, Googlebot crawled roughly 8% of content on Cloudflare's network. In rounded multiples, Googlebot sees:

1.70x the unique URLs seen by ClaudeBot
1.76x the unique URLs seen by GPTBot
2.99x the unique URLs seen by Meta-ExternalAgent
3.26x the unique URLs seen by Bingbot
5.09x the unique URLs seen by Amazonbot
14.87x the unique URLs seen by Applebot
23.73x the unique URLs seen by Bytespider
166.98x the unique URLs seen by PerplexityBot
714.48x the unique URLs seen by CCBot
1801.97x the unique URLs seen by archive.org_bot

This access differential provides Google with an enormous data advantage for training and operating AI systems.

Publisher Behavior Confirms the Dilemma

Cloudflare's data on publisher behavior further illustrates the bind publishers face. Analysis of robots.txt files shows that almost no websites explicitly disallow Googlebot in full. Partial disallows typically impact only irrelevant sections like login endpoints. This reflects Googlebot's critical importance for driving traffic via search referrals.

Similarly, when examining customers using Cloudflare's AI Crawl Control to actively block crawlers via Web Application Firewall rules, the pattern is clear. Between July 2025 and January 2026, websites actively blocking popular AI crawlers like GPTBot and ClaudeBot was nearly seven times higher than those blocking Googlebot and Bingbot.

Publishers desperately want to control AI crawler access, but they cannot risk blocking the crawlers tied to dominant search engines. This is exactly the market power problem the CMA must address.

The CMA's Proposed Solution Falls Short

The CMA's proposed conduct requirements would mandate that Google grant publishers "meaningful and effective" control over whether their content feeds AI features. Google would be prohibited from retaliating by downranking content in search. The proposal also requires increased transparency about how crawled content is used and detailed engagement metrics to help publishers evaluate commercial value.

While supportive of these efforts, Cloudflare believes the proposed requirements don't solve the underlying issue. Publishers would still be effectively forced to use Google's proprietary opt-out mechanisms, operating under conditions set by Google rather than having direct, autonomous control.

This framework doesn't offer genuine control to content creators or encourage competitive innovation. Instead, it reinforces permanent dependency, with the platform dictating rules, managing technical controls, and defining scope of application.

It also reduces publisher choice. New opt-out controls would make it impossible for publishers to use external tools to block Googlebot without jeopardizing search appearance. Publishers would still need to allow Googlebot to scrape websites, with no independent enforcement mechanisms and limited visibility into whether Google respects signaled preferences.

Cloudflare has received customer feedback that Google's current proprietary opt-out mechanisms, including Google-Extended and 'nosnippet', have failed to prevent content utilization in ways publishers cannot control. These tools also don't enable mechanisms for fair compensation.

The Case for Mandatory Crawler Separation

Cloudflare believes all AI bots should have one distinct purpose and declare it clearly, enabling website owners to make informed decisions about who accesses their content and why. Unlike leading competitors like OpenAI and Anthropic, Google does not comply with this principle. Googlebot serves multiple purposes: search indexing, AI training, and inference/grounding.

Simply requiring Google to develop new opt-out mechanisms won't give publishers meaningful control. The most effective solution is requiring Googlebot to be split into separate crawlers by purpose. Publishers could then allow crawling for traditional search indexing while blocking access for unwanted use in generative AI services and features.

Why Separation Is the Only Effective Remedy

To ensure a fair digital ecosystem, the CMA must empower content owners to prevent Google from accessing their data for specific purposes in the first place, rather than relying on post-access, Google-managed workarounds. This approach also enables creators to set access conditions.

Although the CMA described crawler separation as an "equally effective intervention," it rejected mandating separation based on Google's input about operational burden. This is a mistake.

Requiring Google to split Googlebot by purpose—just like Google already does for its nearly 20 other crawlers—is technically feasible, necessary, and proportionate. It would empower website operators with granular control they currently lack, without increasing crawler traffic to websites and potentially decreasing it if they choose to block AI crawling.

Crucially, crawler separation benefits AI companies by leveling the playing field, in addition to giving UK publishers more control. It's not a disadvantage to Google, nor does it undermine AI investment. On the contrary, it's a pro-competitive safeguard preventing Google from leveraging its search monopoly for unfair advantage in AI markets. By decoupling these functions, AI development is driven by fair market competition rather than exploitation of a single hyperscaler's dominance.

A Historic Opportunity

The UK has a unique chance to lead the world in protecting the value of original, high-quality content on the Internet. However, current proposals fall short. Rules must ensure Google operates under the same content access conditions as other AI developers, meaningfully restoring agency to publishers and paving the way for new business models promoting content monetization.

Cloudflare remains committed to engaging with the CMA and other partners during consultations to provide evidence-based data shaping final decisions on conduct requirements that are targeted, proportional, and effective. The CMA still has an opportunity to ensure the Internet becomes a fair marketplace for content creators and smaller AI players—not just a select few tech giants.

Source: Google's AI advantage: why crawler separation is the only path to a fair Internet - Cloudflare Blog