top of page

What Gary Illyes Said About How Crawlers Have Changed

In a recent episode of Google’s "Search Off the Record" podcast, Gary Illyes talked about how search engine crawlers (like Googlebot) have changed over time. Martin Splitt asked him, “Have crawlers changed how they work?”


Gary replied:


Crawling itself hasn’t changed much, but a few things have improved:


1. HTTP Versions Got Better


  • In the past, crawlers used HTTP/1.1.

  • Now we have HTTP/2, and soon it’ll be HTTP/3.

  • Googlebot doesn’t support HTTP/3 yet, but it probably will, because it’s faster and more efficient.

  • With HTTP/2/3, you can send multiple requests through a single connection, which makes crawling faster and smoother.


2. Header & Protocol Updates


  • Earlier, headers (the extra info sent with requests) worked differently.

  • Now, they’ve become more advanced.

  • Also, the robots.txt protocol is super old but still used, it tells crawlers what they can and can’t access.


3. Spammers & Scammers


  • Crawlers now deal with more spam and shady behavior online.

  • Gary mentioned there are “adversarial crawlers” like malware scanners or privacy tools that try to hide what they’re doing so they don’t get blocked, not always malicious, but still tricky.


4. AI & New Bots


  • New AI tools and services are also doing crawling now.

  • Google is trying to reduce its crawling load on the internet…

  • But when new tools pop up, they end up increasing it again.

  • Gary said the real heavy lifting isn't crawling, it’s indexing and processing the data, that’s where the real cost is.


5. Different Companies, Different Rules


  • Every crawler (like Googlebot, Bingbot, etc.) follows its own policies.

  • Good crawlers usually respect the robots.txt file and avoid overloading servers.

  • But tools like malware scanners may follow different rules or try to go unnoticed.


Podcast Timestamp:


This conversation happens around 23:23 in the episode:



TL;DR:


Crawling hasn’t changed that much at its core. But with updates like HTTP/2, smarter bots, and AI tools, things are evolving. Each company runs crawlers a bit differently, and that’s what makes the web crawling ecosystem more complex today.

Commentaires


bottom of page