do we know they didn't download the DB? Maybe the new traffic is the LLM reading...

xp84 · 2025-08-26T19:51:41 1756237901

I would guess site operators can tell the difference between an exhaustive crawl and the targeted specific traffic I'd expect to see from an LLM checking sources on-demand. For one thing, the latter would have time-based patterns attributable to waking hours in the relevant parts of the world, whereas the exhaustive crawl traffic would probably be pretty constant all day and night.

Also to be clear I doubt those big guys are doing these crawls. I assume it's small startups who think they're gonna build a big dataset to sell or to train their own model.