Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> When you find a page [...] and you want to make sure it's available to you later, what do you do?

Instead of doing a bad and lossy job of archiving the page myself, I notify† our friendly neighbourhood archivists at the Internet Archive of the page; and they then do the best, most lossless job of preserving the page that they're able, given their cumulative experience.

http://blog.archive.org/2017/01/25/see-something-save-someth...

As a side-benefit, they also then take care of keeping the archive they've made around and available online in perpetuity, with no additional marginal effort on my part. The same can't be said for something in my own "private collection."




This may not be well-known, but archive.org can and does remove pages / sites from the archive. Authors can request this, site owners (separate from the authors) can request this. There may be others who can request this.

Just an FYI. If there are critical sites you want copies of, I'd recommend making your own copy. I've lost access to important pages / sites twice before taking this to heart.

Edited for clarity


There is value in having a personally curated, offline collection of documents. You can search, annotate or otherwise manipulate it to your heart's content, all without having to be connected.

Of course the Internet Archive serves other purposes for which it is (currently) irreplaceable.


Zotero is much better for this than the too-fiddly print-to-PDF workflow described in the earlier comment.


There's also opportunity cost in spending time maintaining, indexing, annotating your own archive of documents.


> in perpetuity

Hopefully it really is around a very long time, but the world is unpredictable and things change. It's great to enhance the Internet Archive, but you can bet I'm keeping my local copy too. Just in case.


That's subobtimal as well. The site could come out with a new robots.txt file which is just <code>User-agent: * Disallow: /</code> and everything already indexed by the Internet Archive is now inaccessible to you.


Do you never get online receipts that you need to keep a copy of?


I don't think I've ever had such a thing that only appeared as a web page, without being emailed to me. To me, the email is the primary-source document in that arrangement.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: