Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The intention of robots.txt is to tell search systems specifically "do not use information from the following pages in building a search index". The Bing toolbar's use of clickstream data from google, and no doubt many other sites, clearly violates that spirit.

This could easily be fixed, by checking the clickstream data against robots.txt files and discarding data that shouldn't be used. Microsoft apparently has decided not to take that step.



your assumptions:

  - the "intention" of the robots.txt standard is as you state
  - the url is included in the information not allowed by that standard
  - if the url is not included it should be because of the "intention"
  - toolbars are subject to the same standards
I'm not disagreeing with you as much as just pointing out that I don't think EVERYONE agrees on these standards.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: