Hacker News new | past | comments | ask | show | jobs | submit login
A Love Letter to the People Who Build the Internet Archive (archive.org)
164 points by jmsflknr on Feb 14, 2019 | hide | past | favorite | 36 comments



I once rebuilt an entire content site from the internet archive after a catastrophic data loss.


For me Google Cache was helping out under similar circumstances (also tried the Archive first, but apparently Google Cache was [at least back then] more up-to-date).


Now you mention it, I think we used a combination of the two. Google's cache filled in some of the blanks.


the hero we need but don't deserve


(: forget The Avengers, the true heroes are The Archivers


I wish they had a list of torrents they need seeders for the most. Right now I'd I want to seed their stuff (say 100x items) then I gave to go through their site and one by one select files, give it to a torrent client, see how healthy the swarm is, rinse and repeat.


even better:

a binary you can execute. on first launch you allocate space and bandwidth. Thereafter, it stays within configured paramethers and just seeds whatever the archive wants you to seed.


You might find people in the Archive Team interested in working on this. Their actual mission is slightly different but I'm sure there's overlap of interests somewhere.


Agreed. Make it easy to help.


This is something i'll hack on this weekend


sounds scriptable; cool weekend project?


It's not just fair use. 17 U.S.C. sec. 107.

Although the United States Court of Appeals have determined that HathiTrust and Google's massive deriviative works are fair use--

Authors Guild, Inc. v. HathiTrust, No. 12-4547-cv [755 F.3d 87] (USCA-02 2014) ([I]n providing this service, the HDL does not add into circulation any new, human‐readable copies of any books. Instead, the HDL simply permits users to “word search”—that is, to locate where specific words or phrases appear in the digitized books. Applying the relevant factors, we conclude that this use is a fair use.)

Authors Guild v. Google, Inc., No. 13-4829-cv (USCA-02 2015) ([W]e conclude that Google’s making of a complete digital copy of Plaintiffs’ works for the purpose of providing the public with its search and snippet view functions (at least as snippet view is presently designed) is a fair use and does not infringe Plaintiffs’ copyrights in their books.)

For libraries and archives, they also have 17 U.S.C. sec. 108. Limitations on exclusive rights: Reproduction by libraries and archives. A broad exemption that the Internet Archive would likely assert [and likely satisfy]


A year ago, I got very interested in a science-history topic that spans much of the 19th-century. Within a couple of days, I managed to find dozens of rare, original, on-topic, illustrated, computer-searchable source volumes on IA.

If there's another single library in the world that has all of those (PD) works, it's probably far away from the chair I never had to leave.


The publication of the Internet Archive's self-congratulatory love letter this morning is no coincidence. Note the following paragraph:

Libraries are built by people, for people. Thank you so much to all of the people who have contributed to building the Internet Archive, whether they were employees or our huge group friends and family. We would not be here without you, and we hope you will continue to help bring universal access to all knowledge in the future.

Why did the Internet Archive publish this today?

Yesterday, the Author's Guild, the National Writers Union, the National Association of Science Writers, the Association of American Publishers, the Federation of European Publishers, and many more international writing and publishing organizations published an open appeal to readers and librarians concerning "Controlled Digital Lending" (CDL). A link to the appeal and accompanying FAQ can be found here (https://nwu.org/nwu-denounces-cdl/). Here's the text of the appeal:

As working writers, translators, photographers, and graphic artists; as unions, organizations, and federations representing the creators of works included in published books; as book publishers; and as reproduction rights and public lending rights organizations; we oppose so-called “Controlled Digital Lending” (CDL) as a flagrant violation of copyright and authors’ rights.

The copyright infringement inherent in CDL is not a victimless crime. As the victims of CDL, we want librarians, archivists, and readers to understand how they are harming the authors of the books they love by participating in CDL projects, even if they have the best of intentions.

The attached FAQ was written to explain to authors, publishers, readers, librarians, and archivists what CDL is, how it differs from traditional and legitimate new forms of library lending, how it violates the economic and moral rights of authors, and how it makes it even harder for authors to try to make a living from writing or to afford to devote time to writing.

When writers can’t make a living, they can’t afford to keep writing, and readers lose too.

Well-meaning librarians, archivists, and readers, who don’t intend to deprive authors of their livelihoods, are being misled by false claims from proponents of CDL. Under CDL, printed books are being scanned and distributed online to readers worldwide by the Internet Archive and U.S. and Canadian libraries.

CDL is not comparable to lending of physical books by libraries. CDL is not “fair use” as defined in U.S. copyright law, and an exception to or limitation of copyright to allow CDL without permission or remuneration would not be permitted by the Berne Convention on Copyright. CDL interferes with many of the normal ways, including new ways largely unnoticed by librarians, that authors are earning money from written and graphic works included in so- called “out of print” books. There is no basis for a good-faith belief that CDL is legal under either U.S. or international law.

We appeal for a dialogue among writers, authors, publishers, and librarians on how to enable and create the digital libraries we all want, in ways that fully respect authors’ rights.

As an author and a publisher, I would go one step further and demand that scanning under CDL be immediately halted until issues related to copyright, compensation, and takedown are worked out and implemented. (Disclosure: I also serve on the board of the Independent Book Publishers Association which is a cosigner of the appeal and issued a position statement that calls for scanning and distribution of in-copyright works be stopped immediately, see https://www.ibpa-online.org/news/438161/IBPA-Position-Statem...)

Right now, you can go to the IA's website https://openlibrary.org/ and see what's available - many thousands of titles written by authors who are still alive and who won't receive a cent in royalties or licensing fees when someone "borrows" their books or embeds them on a website. According to the Open Library's vision page (https://openlibrary.org/about/vision), "The ultimate goal of the Open Library is to make all the published works of humankind available to everyone in the world."

Like many people on HN, I regularly use IA to track the history of websites or download out-of-copyright and public domain works. Sharing this information is important and should be continued. I also believe in the concept of "Fair Use" for sharing and discussing excerpts of more current works. But when it comes to outright republishing of in-copyright printed works, the rights of creators and publishers need to be recognized. CDL, as currently implemented, fails to do give such recognition, let alone compensate creators and publishers.


1) you should really explain what CDL is earlier in the post. I’d never heard of it before.

2) still don’t see the connection to the Internet Archive.

3) isn’t CDL covered by the Exhaustion Doctrine / First Sale Doctrine? [0]

[0] https://en.wikipedia.org/wiki/Exhaustion_of_intellectual_pro...


According to [1], at least, "first sale doctrine does not apply to electronic books".

[1] https://en.wikipedia.org/wiki/First-sale_doctrine#Applicatio...


IA's definition of CDL and yours (by proxy of the NWU faq you linked) seem quite different.

IA contends that the CDL program lends one digital copy per physical copy the lender owns (ie. the “owned to loaned” ratio). Which is in direct contradiction to the copies-of-copies described by the NWU.

Can you shed light on this discrepancy?


AIUI, the NWU faq explains that even after a loaned copy is marked as "returned", the page images are present on the borrower's computer (at least in the browser cache). So there's not a 1:1 owned to loaned ratio: each "loan" effectively creates a new digital copy, which the borrower might retain indefinitely.


This seems like a bit of a cop out. You could borrow a normal book from a library and photocopy it as well. It seems like IA is doing everything they can to try not to exceed the 1:1 loan ratio.

I've been recently working on a project in digital book distribution for libraries, and the terms that publishers want to force on libraries for digital distribution are pretty harsh and enforced by DRM by the publisher. Things like you can only loan this book 50 times before renewing its license.

IMO thats what this is really about, trying to undermine the clever work around of digitizing physical works and loaning them like a normal library would so that is can be replaced with a more revenue friendly model where publishers can dictate terms using DRM.


> You could borrow a normal book from a library and photocopy it as well.

Or, more realistically, you could borrow an audiobook or CD and copy it perfectly - but people don't seem to be overly opposed to allowing libraries to lend those out.


As someone who went to college and regularly saw professors checking out textbooks from the college library and photocopying questions/articles to distribute to their classes, the physical model doesn't seem like it actually offers many protections here.


IIRC fair use may cover that if it's only small portions of a work or significantly transformative, and not for profit. There was also an exemption from the DMCA for educators in recent years that may be relevant.


CDL sounds great. Writers won’t stop writing in any case. And if they did, well, there are already too many books to read in a lifetime.

It’s only natural that authors/writers use their skills and connection to publish moralizing defenses of current copyright laws. It’s in their own interest. It’s also easy to see through.


So, your work is so mediocre (at best) that people will not buy it. That's why you demonstrate against CDL, because you think you might have sold five copies more if people would not be able to lend it. I see, I see...


Thanks for the tip about Open Library! I didn't know about that project.


The Internet Archive is a not-for-profit that deserves your support. Not only do they archive the Internet, they are involved with several projects that will help make it better and even more useful.

They have an informative blog about donations: http://blog.archive.org/donation-faqs. Or just go to http://archive.org/donate and make a donation.


One of the few places I donate to every year and will continue to do forever. I use it constantly and I'm amazed every time I find some old site that I thought was lost to time. These folks are doing truly important work.


Aaron Swartz & Jason Scott More people like them is what we really need!


Do they also archive youtube videos?


Yes! Also, YouTube shows you the URIs of deleted videos on playlists, so it's pretty trivial to copypaste them into IA and youtube-dl a copy.

I have snatched many things I loved from the jaws of YouTube's copyright enforcement algorithm this way.


TIL!


They accept uploads of videos originally on YouTube (like "Me At The Zoo", the first ever YT video[1]), but there's no automatic crawling like with the Wayback Machine, AFAIK. Still, the "YouTube" subject in their video archive has almost 1M videos.

https://archive.org/details/MeAtTheZoo


Do they not have a feature to archive videos on request?


Not that I know of. There are some tools for automatic copying from YT to IA, but they all download the video to your machine and upload it afterwards.


that's a good question. i have an archive of star trek fan films, almost all of which are youtube originals that i'd like to somehow upload to the internet archive.


It's crazy to think of the internet without the Internet Archive. It's really fundamental work.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: