You’d have to trust that the data being dumped was 100% identical to the actual ...

kevindamm · 2024-04-17T23:57:38 1713398258

> I know that some version of this can and does occur with classic web scraping too, but that is an arms race that a search engine can win

Cloaked links and cloaked ads still happen on direct requests, too -- a search engine's crawlers come in a widely known IP range (or if they start using unknown or new IPs, they become known soon enough) so even spoofing the user agent of the bot isn't a reliable workaround.

I'd say the arms race is still escalating, though I've been out of that game for a little while I'm still rather sure of that.

kevincox · 2024-04-18T01:23:41 1713403421

You can just spot check a tiny fraction of the data to validate this big it doesn't match the the site gets blocked.