Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On a website I'd written we had pseudo-randomly generated URLs to show dynamic content (it was a game, the URL contained parameters). On each page we had this little widget that included five random configurations people might like to try.

A few times our website went down due to the load going >30, eventually I discovered Google was doing something funky, adding the dynamic domains to the "robot.txt" file fixed the issue. Then some other search engines / scrapers seemed to run into the same issue and started requesting hundreds of thousands of URLs per day (these pages were dynamically generated and took a moderate amount of compute power).

We eventually did have to implement basic anti-scraper rules because it was degrading the user experience.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: