Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

OP mentions using robots.txt to avoid crawling but even google ignores this now correct?

1. https://www.searchenginejournal.com/google-robots-txt-noinde...



OP here. I'm not sure about the details in your link, but basically my understanding lines up with [1]; robots.txt isn't guaranteed to be respected, but generally is.

FWIW, what I specifically have in robots.txt is

    User-agent: *
    Disallow: /
which seems to work well for me so far (i.e., I do not find my house documentation site on any search engine).

[1]: https://developers.google.com/search/docs/crawling-indexing/...


If I understand the details of the link, it was a particular feature of robots.txt that was considered undocumented/unsupported that Google dropped support for.

I think the point of it was that you could tell Google to crawl some pages (for links) but not index them?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: