Coin World reported:
Global internet security company Cloudflare claims to protect nearly 20% of the world’s internet traffic. It has introduced a so-called “simple button” for website owners who want to block access to their content by artificial intelligence services. This move comes at a time of increasing demand for content used to train AI models.
Cloudflare’s core service acts as an internet proxy, scanning and filtering traffic before it reaches websites. The company states that its network receives over 57 million requests per second on average.
In a statement on Wednesday, Cloudflare said, “To help content creators maintain a safe internet, we have just launched a new ‘simple button’ to block all AI robots.” “We have heard clearly that customers do not want AI robots accessing their websites, especially the dishonest ones.”
While some AI companies correctly identify their web crawling robots and respect website directives by staying away from them, not all companies are transparent about their activities.
The new simple setting is being offered to all Cloudflare customers, including those on its free tier.
Analyzing AI Robot Activities
Alongside the announcement, Cloudflare also shared a wealth of information about AI crawler activities observed in its system.
According to Cloudflare’s data, in June, AI robots using Cloudflare accessed about 39% of the top 1 million “internet properties.” However, only 2.98% of the properties took measures to block or question these requests. Cloudflare also mentioned, “The higher the ranking of an internet property (the more popular), the more likely it is to become a target for AI robots.”
The company stated that AI crawlers operated by TikTok owner ByteDance, Amazon, Anthropic, and OpenAI were the most active. The top crawler was Bytespider by ByteDance, ranking first in request volume, activity scope, and block frequency. GPTBot, managed by OpenAI for collecting training data for products like ChatGPT, ranked second in crawling activity and blocks.
Confused Crawlers
Recently, confused crawlers have caused controversy for their content crawling behavior, detected accessing a small portion of websites protected by Cloudflare.
While website owners can implement their own rules to block known web crawlers, Cloudflare also stated that most of its clients only block more mainstream AI developers like OpenAI, Google, or Meta, rather than top crawlers from ByteDance or other companies.
AI vs. AI
Cloudflare’s report highlighted how some AI robot operators employ deceptive tactics to circumvent measures blocking them, attempting to disguise their crawler activity as legitimate web traffic.
Cloudflare wrote, “Unfortunately, we observed robot operators attempting to disguise themselves as legitimate traffic by using forged user agents that look like a real browser.”
Indeed, AI is a key tool for the company to block automated activities, whether from AI developers, search engines, or malicious attackers. Cloudflare stated that it uses machine learning models to assign a “robot program score” to each request made to sites protected by its services, with a low score indicating a low likelihood of legitimate activity.
Using its vast dataset on global internet traffic, the model considers many signals, including request IP addresses, user agents, and behavioral patterns, to determine the robot program score.
To illustrate this point, Cloudflare stated that it studied the traffic of a specific robot known for evasive behavior. The results were compelling: all detected scores were below 30 (out of 100), with the majority falling into the bottom two brackets, indicating scores of 9 or below. In other words, even when attempting to conceal their origins, the activity patterns of robots would reveal them—allowing Cloudflare to block them.
Protecting Web Content
Generated AI models rely on a vast amount of existing content, much of which is collected from the web. To enable AI to continue providing current information, developers need to continue collecting information on a large scale.
As large publishers like news organizations take legal action against AI companies, website owners and content creators are fighting back. In the case of the aforementioned “confused” incident, publications like Forbes and Wired claimed unauthorized access to and republication of their content. Music publisher Sony warned over 700 tech companies in May to stay away, and this week Warner Music Group took similar action.
If AI increasingly provides information to users without disclosing sources, this threat could exist for publishers. A recent study by SparkToro CEO Rand Fishkin showed that 60% of people searching for information on Google stop visiting websites that provide information because Google’s AI immediately provides summarized answers.
Edited by Ryan Ozawa.
Safeguarding Websites from AI Bots with Cloudflares Shield
Related Posts
Add A Comment
© 2025 Bull Run Flash All rights reserved.