It's been quite a while since I last did web-scraping (I used to use BeautifulSo...

_skel · on Nov 2, 2017

CDNs like Distil Networks and Cloudflare make scraping more difficult than it used to be. If you get caught by them, you can end up blocked from all of the sites they protect, not just the one you were scraping.

always_good · on Nov 2, 2017

Writing some scrapers this week, I noticed it's also common for the origin server to just check if the request is coming from VPN/VPS IP address range.

For example, the exact same request will work from your home connection where it doesn't work from EC2.

mfrye0 · on Nov 2, 2017

It's gotten pretty challenging from what it used to be.

A lot of small things... but basically if you load from an actual browser (headless) and cycle IPs, it's pretty hard for a site to pinpoint you as a bot vs a user.