I am the journeyer from the valley of the dead Sega consoles. With the blessings of Sega Saturn, the gaming system of destruction, I am the Scout of Silence… Sailor Saturn.

  • 0 Posts
  • 4 Comments
Joined 2 years ago
cake
Cake day: June 29th, 2023

help-circle
  • Edit: But also - why do AI scrapers request pages that show differences between versions of wiki pages (or perform other similarly complex requests)? What’s the point of that anyway?

    This is just naive web crawling: Crawl a page, extract all the links, then crawl all the links and repeat.

    Any crawler that doesn’t know what their doing and doesn’t respect robots but wants to crawl an entire domain will end up following these sorts of links naturally. It has no sense that the requests are “complex”, just that it’s fetching a URL with a few more query parameters than it started at.

    The article even alludes to how to take advantage of this with it’s “trap the bots in a maze of fake pages” suggestion. Even crawlers that know what they’re doing will sometimes struggle with infinite URL spaces.