Why Crawler Separation is Essential for a Fair AI-Powered Internet

The UK's Competition and Markets Authority (CMA) has opened a landmark consultation on proposed conduct requirements for Google, marking the first regulatory intervention under the UK's new digital markets competition regime. At stake is the future of content access, publisher rights, and fair competition in the rapidly evolving generative AI landscape.

The Strategic Market Status Designation

In October 2025, the CMA designated Google as having Strategic Market Status (SMS) in general search and search advertising, citing its commanding 90% share of the UK search market. This designation, enabled by the Digital Markets, Competition and Consumers Act 2024 (DMCC), grants the CMA authority to impose legally enforceable conduct requirements on Google's search ecosystem—including its AI-powered features like AI Overviews and AI Mode.

Unlike traditional antitrust investigations, the SMS framework allows proactive intervention to address risks to competition before they calcify into permanent market distortions. The timing couldn't be more critical, as the intersection of search dominance and generative AI creates unprecedented competitive dynamics.

The Publisher Dilemma

Publishers face an impossible choice: allow Googlebot to crawl their content for search indexing—essential for driving traffic and ad revenue—or block it and lose visibility in the world's dominant search engine. Google's current architecture exploits this dependency by using the same crawler for multiple purposes: traditional search indexing, AI training, and real-time inference for generative AI features.

This creates what the CMA correctly identifies as a fundamental problem. Publishers cannot block Googlebot without jeopardizing their business, yet allowing it means their content is automatically incorporated into AI-generated responses that compete directly with their own sites—often without meaningful attribution or compensation.

The result: publishers watch as Google's AI features reproduce their content, returning minimal traffic while undermining the ad-supported business models that have sustained digital publishing for decades.

Google's Structural Advantage

Cloudflare data quantifies Google's competitive edge with striking clarity. Over a two-month observation period, Googlebot successfully accessed nearly twice as many unique pages as ClaudeBot and GPTBot, three times more than Meta-ExternalAgent and Bingbot, and an astounding 167 times more than PerplexityBot.

Breaking down the multiples:
- 1.7x more unique URLs than ClaudeBot
- 1.76x more than GPTBot
- 2.99x more than Meta-ExternalAgent
- 3.26x more than Bingbot
- 167x more than PerplexityBot
- 1,802x more than archive.org_bot

This disparity isn't accidental—it reflects Googlebot's unique position as a dual-purpose crawler that publishers dare not block. The data reveals that websites explicitly disallow or block AI-focused crawlers at rates nearly seven times higher than they block Googlebot, despite Googlebot also being used for AI purposes.

Publishers understand the mathematics: blocking GPTBot protects content from OpenAI's training while maintaining search visibility. Blocking Googlebot means disappearing from search results entirely—an existential threat no content creator can afford.

The CMA's Proposed Requirements Fall Short

The CMA's January 2026 consultation proposes requiring Google to provide "meaningful and effective" publisher controls, increase transparency about content usage, and ensure proper attribution. These requirements include:

- Opt-out mechanisms for AI features separate from search indexing
- Clear documentation on how crawled content is used
- Disaggregated engagement metrics for publishers
- Prohibitions on retaliatory downranking

While these measures represent progress, they fundamentally miss the mark. Requiring Google to create proprietary opt-out mechanisms still forces publishers to operate within Google's controlled framework. Publishers must trust Google's implementation, rely on Google's enforcement, and accept Google's definitions of what constitutes compliant behavior.

This isn't effective control—it's managed dependency.

Why Mandatory Crawler Separation is the Only Solution

Cloudflare, along with major UK publishers including the Daily Mail Group, The Guardian, and the News Media Association, advocates for a different approach: mandatory crawler separation. This means requiring Google to deploy distinct crawlers for distinct purposes:

- One crawler for traditional search indexing
- Separate crawlers for AI training
- Separate crawlers for inference/grounding in generative AI

This isn't unprecedented—Google already operates nearly 20 specialized crawlers for different purposes. Crawler separation is technically feasible, operationally straightforward, and aligned with responsible AI principles that dictate bots should have one clear purpose.

Separation provides true agency. Publishers could allow search indexing while blocking AI training, or permit inference but not training, or any other combination that aligns with their business interests. More importantly, they could enforce these preferences using standard web technologies like robots.txt or Web Application Firewalls—no special Google interfaces required.

The Competition Argument

Crawler separation isn't anti-competitive—it's pro-competitive. Google's current approach leverages search monopoly to gain unfair advantage in the AI market. By forcing publishers to grant access for all purposes or none, Google obtains training and inference data that competitors must license or negotiate for.

This structural asymmetry prevents the emergence of fair content marketplaces. AI companies have minimal incentive to negotiate reasonable compensation when they know Google is obtaining the same content for free through its search monopoly. Publishers have no leverage to demand fair value when the alternative is search invisibility.

Mandatory separation levels the playing field. All AI companies—including Google—would compete on equal terms for content access. Publishers could monetize their content appropriately. The market could discover fair pricing through actual negotiation rather than monopoly-driven extraction.

Broader Implications for Content Creators

The UK's decision will reverberate globally. If the CMA accepts Google's argument that crawler separation is too onerous, it signals that dominant platforms can leverage one monopoly (search) to capture adjacent markets (AI) without meaningful constraint. If the CMA mandates separation, it establishes a template for protecting content creator rights while fostering genuine AI competition.

For publishers, creative professionals, and anyone who produces original content online, the stakes are existential. The Internet's value proposition has always been straightforward: create compelling content, attract an audience, monetize through advertising or subscriptions. AI-powered search features that reproduce content without driving traffic breaks this model fundamentally.

Looking Forward

Cloudflare remains committed to engaging with the CMA during the consultation process, providing data-driven evidence to support targeted, proportional, and effective conduct requirements. The opportunity exists to ensure that AI development proceeds on competitive foundations rather than monopoly exploitation.

The Internet deserves clear rules of the road for AI crawler behavior—rules that respect content creator rights, enable fair competition, and promote innovation rather than entrench dominance. Crawler separation isn't a barrier to AI development; it's a safeguard ensuring that development happens fairly.

The CMA's final decision will determine whether the UK leads the world in protecting content value and fostering competitive AI markets—or whether search monopolies successfully extend their dominance into the AI era unchecked. The consultation remains open, and the outcome will shape Internet governance for years to come.

Source: Cloudflare policy analysis published in response to UK CMA consultation on Google's Strategic Market Status conduct requirements. Data reflects observations from Cloudflare's global network.