Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Quote: "..even if you instruct it to begin archiving a site then it can easily fail if that site’s robots.txt prevents crawling"

Huh? Does actually the big corporations care anymore about robots.txt? Nowadays is more of a "netiquette" than anything else. Google definitely ignores it. Dunno DuckDucGo what it does



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: