Cloudflare adds feature to confuse AI scraper bots, drawing publishers praise

July 15, 2025

Sponsor ad - 728w x 90h (at 72 dpi)

In March 2025, the CDN provider Cloudflare announced an anti-piracy feature designed to stop the operators of generative AI platforms from training their systems using unlicensed content from publishers. At the beginning of July, the feature became available to all users of Cloudflare’s services on an opt-in basis.

In its July announcement, a host of publishers praised Cloudflare’s move. Cloudflare says that the portion of AI-generated content in some content categories is approaching 50%, and that AI crawlers submit more than 50 billion scraping requests to its network every day.

Sponsor ad

Cloudflare well positioned

Cloudflare estimates that it provides distribution management and protection services for 20% of the Web. In April 2025, Cloudflare sid that more than 10% of all websites connect through its reverse proxy service, including 17% of the Fortune 1000.

A graph of daily requests over time, comparing different categories of AI Crawlers. Source: Cloudflare

According to reporting by the BBC, Cloudflare’s service initially covers about a milion UK Web sites; representing about 20% of live Web sites in the UK. The BBC itself has a direct stake in the issue: in June, it had writtent the US-based AI firm Perplexity to “stop using BBC content, delete any it holds,” and receive compensation for that already used.

Technical countermeasures

Cloudflare’s methods include an opt-in service called AI Labyrinth, “a new mitigation approach that uses AI-generated content to slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect “no crawl” directives. When you opt in, Cloudflare will automatically deploy an AI-generated set of linked pages when we detect inappropriate bot activity.”

Another method is to use cryptograhy to verify bot and agent traffic. Historically, User Agent Headers have been used to verify legitimate Web crawlers, but these can be spoofed. Another method is to identify a site by its IP address, but these can be obscured through the use of proxy services and VPNs.

In May 2025, Cloudflare proposed that publishers sign their requests cryptographically through HTTP Message Signatures and by signing the target URI to verify its authenticity.

Cloudflare is also developing a system to help publishers request payment for using their content. Additional methods are linked toward the end of Cloudflare’s July 1 press release.

Not everyone agrees

Predictably, detractors of Cloudflare’s approach include the producers of the AI platforms that Cloudflare and others have set out to manage.

Stirring the pot in June 2024 at the Aspen Ideas Festival, Microsoft AI CEO Mustafa Suleyman said:

“With respect to content that is already on the open web, the social contract of that content since the 90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That’s been the understanding,” said Suleyman.

“There’s a separate category where a website or a publisher or a news organization had explicitly said, ‘do not scrape or crawl me for any other reason than indexing me so that other people can find that content.’ That’s a gray area and I think that’s going to work its way through the courts.”

He was interviewed by Andrew Ross Sorkin of CNBC, which was a media partner of the event.

Further reading

Cloudflare just changed how AI crawlers scrape the Internet-at-Large; permission-based approach makes way for a new business model. Press release. July 1, 2025. Cloudflare

Millions of websites to get ‘gamechanging’ AI bot blocker. Article. July 1, 2025. by Chris Vallance, Senior Technology Reporter. BBC

Trapping misbehaving bots in an AI labyrinth. Article. March 19, 2025. by Reid Tatoris, Harsh Saxena and Luis Migliette. Cloudflare

BBC threatens AI firm with legal action over unauthorised content use. Article. June 20, 2025. by Liv McManon, Technology Reporter. BBC.

CEO of Microsoft AI speaks about the future of artificial intelligence at Aspen Ideas Festival. Video interview. June 25, 2024. NBC News via YouTube.

Why it matters

Anti-piracy programs such as these protect not only copyrighted content, but also helps preserve revenue attracted by online advertising.

By generating ‘garbage’ content to ‘confuse’ generative AI scrapers, Cloudflare’s AI Labyrinth is not unlike the software- and code-obfuscation techniques used to help thwart the penetration and reverse engineering of software apps by developers of malware.

While requests by publishers carry legal backing via copyright law, there is no guarantee that AI bots will respect the rules posted on publisher Web sites.

Such rules can be automated through vehicles such as ‘robots.txt’ files, as they are not recognized as legal documents, nor are they universally supported. Robots.txt files are used to present business rules that direct crawlers not to access or index site content.

Another safeguard is for publishers to post copyright notices, but these too are not universally respected.

From our Sponsors

Cloudflare adds feature to confuse AI scraper bots, drawing publishers praise

Recent News

Dominican law enforcement dismantle IPTV66, arrest pirates after ACE referral

Ireland and UK: FACT partners with Sky to stop retailers from...

Former employee at Memphis replication facility sentenced for pre-release DVD piracy

Recent research

Research: Online piracy trends worsen despite European Commission recommendation

AGCOM publishes Q1 2025 assessment of Italy’s media and communications market

IBM: 51% of data breaches resulted from malicious attacks. AI both...

Disclaimer