Hacker News new | past | comments | ask | show | jobs | submit login

I don't understand, why only 20%? If the traffic is from known scrapers why can't you just render "scrap off" ie easily get rid of them?

And traffic from good scrapers is of course pretty much impossible to measure so you don't know how big percentage of scrapers you got rid of in total.




If the scraper gets back nothing, they know they've been spotted and will make adjustments. Easy to check for automatically. If you alter the page to feed them garbage, it takes longer to notice.


It's easy for someone viewing some logs to say "ok this is very likely automated scraping", though it can be harder to automate detecting this. In the same way that porn is obvious to a human, but not a computer.


But it is not magic, though, and surely can be automated.

In fact, I would argue that your time might be better spent on this, instead of randomizing CSS classes. If you end up building something worthwile, that could be a great product too! (Look at all those CDNs / Anti-DDoS platforms, sounds like they could've been started this exact way.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: