Hacker News new | past | comments | ask | show | jobs | submit login
Secretly Public Domain: Most books published in the US before 1964 (crummy.com)
286 points by BerislavLopac on Aug 2, 2019 | hide | past | favorite | 62 comments



I wish we would bring back manual renewals for copyright. I get that publishers thought it was a hassle to have to do the work for stuff they had not published in decades and had no intention of ever reprinting, but that's kind of the point isn't it? Why is the government locking away works that the original rightsholders no longer care about? The copyright system is supposed to serve the common good, what common good is there from locking away works simply because the original owner lost interest in them? Or worse, when nobody knows who or where the original owner is.


I'm not a big fan of copyright renewals. There are two arguments I here in favor of them. 1) It allows orphaned works to enter the public domain sooner. 2) For works that are renewed, it provides information on who the copyright holder is and how they can be contacted.

On the first point, I think that renewal is a poor proxy for determining which works are are orphaned or no longer have commercial value, instead just showing which copyright holders have the most diligent book keeping. Large number of published works will loose copyright protection just due to oversight. A simple mistake shouldn't result in disproportionate consequences (loss of decades of revenue). At the same time other works that are no longer available on the market will be prevented from entering public domain by companies that just renew everything.

Furthermore, historically, the USPTO has done a poor job of keeping (and making accessible) records of which works were renewed, and which weren't. Anyone who wants to use an orphaned work has needed to prove a negative, which is difficult to do, so most assume that everything is still copyrighted to avoid liability anyway.

Lastly, just because someone has registered a work (for renewal or otherwise) is not proof that they hold copyright on that work. People have spent decades trying to determine the rightful owner of a work so they can license it, even when it was registered.

Instead, I think what we need is a process for the USPTO to grant third parties permission to use a work when a good faith effort has been made to determine and contact the copyright holder. In my view, this would involve involve paying of statutory license fees to the USPTO who will hold them in escrow. If the owner is eventually determined, they can claim the money, and if no one claims it after some time it goes to fund the endowment for the arts. This would allow the public access to orphaned and disputed works again, and would encourage registration without the strong consequences for forgetting.

I would actually take this a step further and declare that all works are subject to statutory licensing for the second half of their copyright duration, not just orphaned works, and the above process would just be a fallback when the copyright holder can't be contacted. And of course, would shorten the copyright duration as well.


> On the first point, I think that renewal is a poor proxy for determining which works are are orphaned or no longer have commercial value, instead just showing which copyright holders have the most diligent book keeping. Large number of published works will loose copyright protection just due to oversight. A simple mistake shouldn't result in disproportionate consequences (loss of decades of revenue). At the same time other works that are no longer available on the market will be prevented from entering public domain by companies that just renew everything.

So in other words, renewing should cost a decent amount of money, so that not every work is worth renewing?

I don't really see the fact that people will let things enter the public domain sooner as a downside, even if it would (theoretically) have earned them more money. If they aren't organized enough to renew the copyright, why would they be organized enough to actually use the work?


I don't really see a big problem. You register a copyright within a year of publication. The only one who can renew it is you or someone you designate, which must have a paper trail. You supply contact info. If you cannot be contacted by that contact info, copyright claims for your work are not enforceable.

If you can't be bothered to do this, why should you be entitled to copyright protection?


So if your house burns down and your "paper trail" turns to ash as a result, you automatically lose copyright on all your publications?


The legal system has well-established means for dealing with the loss of important papers like titles, deeds, contracts, wills, power of attorneys, passports, birth certificates, receipts, etc.

For example, you can make a certified copy, and deposit the copy in a safe deposit box or with your lawyer.


Right, but let's say you didn't do that because you weren't well-versed in the legal system, so all you did was register your copyright, then transfer the registration to your friend, who then stashed the documentation in a filing cabinet.

Should you have to be an expert in the legal system to avoid losing your intellectual property when your house burns down?


Like I said, there are established legal precedents for what to do if you burn your papers.

Besides, the whole reason lawyers exist is so that people who aren't experts in the legal system can hire one. You're not going to get very far in business without discovering you need the services of a lawyer and a CPA. Often the hard way.

I recall the actor Will Smith (Fresh Prince) who discovered the hard way that he needed the services of a CPA.


You're not answering the question.

"Let's take a system that works today and impose an additional paper trail, where you lose all your rights if you lose the paper trail."

"Won't that penalize people who don't have a lawyer and who lose their paper trail?"

"Tough, people need to learn to use lawyers"

In fact, now that I right that out, that's an extremely antagonistic approach to take and it punishes the least privileged people (e.g. the people who can't afford lawyers), while doing absolutely nothing for the giant corporations who have lawyers on retainer and will naturally have a process in place to retain proper copies of the paper trail always.


> Instead, I think what we need is a process for the USPTO to grant third parties permission to use a work when a good faith effort has been made to determine and contact the copyright holder. In my view, this would involve involve paying of statutory license fees to the USPTO who will hold them in escrow.

In this case isn't the government stealing from the copyright holder and/or public domain? If I use a presumably abandoned work as a jumping off point to a billion dollar franchise and then the original copyright holder comes out of the woodwork to claim 50% of everything I've made will the government escrow cover that? What if two people come out claiming to own the copyright? This solution seems to have all of the problems of the current system and offer little to fix the problem.

I agree that the old paper recordkeeping was not scalable to the modern world, but an online registry of works doesn't seem out of the question. It's exactly the sort of thing the web should be good for. Renewals could also be done online. For easy lookup a standardized copyright numbering system could be optionally added to each work similar to an ISBN.

The hardest problem would be validating that people actually have rights to what they claim on the service. There would undoubtedly be a cottage industry of people seeking out forgotten works and registering them under their name for the license fees. A modest fee for renewal should help reduce that problem and also stiff penalties for knowingly stealing other peoples work. The fees would be used to run the massive server farm necessary for keeping all of those records available. I would also suggest that a modest copyright term, say 15 years or so would automatically be applied to any work and you would only need to register if you're planning to renew the work after the original term is up. We don't need people writing bots that automatically register every forum post or email or google query they make, because there will definitely be people who try that shit just to troll for licensing dollars.


> If I use a presumably abandoned work as a jumping off point to a billion dollar franchise and then the original copyright holder comes out of the woodwork to claim 50% of everything I've made will the government escrow cover that?

It would be the same as any other statutory licensing system, where once the government grants you permission to use the work, and you pay the statutory fees, the copyright holder has no claim to any rights in your derivative works. The only right they have at that point is to the fees, and it is between them and the government as to whether they are the rightful copyright holder, no reason to involve you.

Again, I think this would work smoother if all works had say 20 years of full copyright protection, just like today, and then another 20 subject to statutory licensing. Then in that second period there would be no dispute over whether you, the third party, had the right to use the work, the only dispute is over who gets the statutory fees, and the government agrees to be the middle man in situations where that is not clear, so that uncertainty of ownership won't stifle use.

> In this case isn't the government stealing from the copyright holder and/or public domain?

Stealing implies that either the copyright holder or the public have some inherit ownership right in the work, however their rights are only what the law grants them. In this system, works would not enter the public domain until after the second period is over, so the public domain has no rightful claim to them (orphaned or not). In the second phase of copyright, the holder would no longer have absolute right to control the work, and only limited rights in setting prices. Anyone could redistribute or make derivative works (subject to trademark law, etc), as long as they paid the statutory fees. If a copyright holder abandons the work, then you cannot blame the government for treating the fees similar to other abandoned property.


Grant short-term copyright for any written works, but require registration and periodic renewal for longer duration, with increasing fees for subsequent renewals.


I would be in favor of continuously increasing percentage of revenue based tax, not some arbitrary fee.


That's kind of hard to measure though. It would require a LOT of government auditing to catch cheaters. Like Hollywood movies would never have to pay to be renewed because they never make any net revenue.


The copyright system was never intended to serve the common good. Rather, its basis was the "Statute of Anne". It was about censorship and regulation of the book trade.

For more information: https://en.wikipedia.org/wiki/Statute_of_Anne

Now, it said already back then that its intent was the "encouragement of learning", but the intent was very thinly veiled.



Here is a very nice table listing pretty much every case under US copyright law [1].

For books first published in the US, here is what is now in the public domain.

• Books published before 1924.

• Books published from 1924 through 1977 without a copyright notice.

• Books published from 1978 to early 1989 without a copyright notice and without a subsequent registration within 5 years.

• Books published from 1924 through 1963 with copyright notice but whose copyright was not renewed. (This is the case for the books the article is talking about).

• Books prepared by an officer or employee of the US government as part of the person's official duties.

[1] https://copyright.cornell.edu/publicdomain


The important point in this article is that the NYPL just recently digitized the relevant records, thereby making it possible to determine whether a copyright was renewed. The old rule of thumb (anything after 1923 is presumptively protected) was used because looking for the absence of a renewal in a multi-decade span of printed records just wasn't feasible.

The linked post in the article is very detailed and has more background: https://www.nypl.org/blog/2019/05/31/us-copyright-history-19...


I would be very surprised if this were true. I volunteer with standardebooks.org on getting books in the public domain transcribed, proofread, and typeset for modern e-readers. There are a swarm of books that we would like to upload but can't, due to unclear copyright status on them. This includes stuff that everyone and their mother has probably bought several times already, including the works of HP Lovecraft, Robert E. Howard, etc. For most of these, we still follow the Mickey Mouse rule, and do not proceed until we are absolutely sure they're safe.

Case in point, I've been thinking about transcribing The Worm Ouroboros[0] for some time, as it should be in the public domain even by conservative estimates, and yet I can't formally verify its copyright status, so I haven't yet.

[0] https://en.wikipedia.org/wiki/The_Worm_Ouroboros


Alex from SE here. I wanted to transcribe Worm a few years ago too. IIRC the problem was that PG-AU had a post-1923 edition transcribed. The 1922 first edition is extremely rare. The Newberry Research Library in Chicago has a copy and I actually flipped through it once. But you can't check out those kinds of rare books. They can scan books for you but at a fairly steep cost and I didn't want to do that at the time.

Maybe the situation has changed since then! But I'd love to have that book in our catalog.


Would I be able to take photos of the book pages if I'm in Chicago without checking it out (on prem capture)? I am familiar with using a DSLR to non-destructively capture book contents, and am willing to make the time to do so. Looks like you even get 2 hours of parking for free!


Possibly, you should ask them. However most decent OCR requires a flat top scanner or (in the case of rare books like this) a scanning device that holds pages firmly and takes pictures from above. I imagine that bringing your DSLR and doing it by hand for 200+ pages would be extremely tedious and error-prone.


You can get decent ocr from using thinner books to level the pages by supporting one side of a book as you flip and holding open with your hands. Source: I have ocred approx 200,000 pages this way for one of my websites. It is not the best, but it works surprisingly well as a poor man's just get it done method. Aim for at least 300 ppi and ABBYY can figure it out. 200 ppi still works well.


I have a portable cradle I can bring with me that will hold the camera above while I flip pages. I emailed Newberry Library, and will report back when I have more info.


I have time to allocate to this next week and am willing to help. Was at Newberry just last week for their monumental book sale.


I will be in touch when I hear back!


There’s three copies of the 1922 Cape edition listed for sale on abebooks right now, though you would have to cough up about $500 to get one.

Was the text revised for the US edition?


It was published in 1922, and the author died in 1945. Seems like that book is PD by any measure.

According to the article, it looks like Gutenberg adds a layer of "let a lawyer sign off on it", presumably to keep them from getting sued out of existence. Is that the case for Worm Ouroboros?


It doesn't seem that surprising to me. The lack of clear US copyright status was entirely because there was no easy way to exhaustively search the copyright registration/renewal records, and now there is. If there is no renewal contained in these records, it is definitively not copyrighted.


It's been almost two decades since I've worked with Project Gutenberg and Distributed Proofreaders, but back then they had pro bono counsel who would provide opinions and clear works prior to publication. Does standardebooks.org not have the same?


Project is too small, from my understanding (I'm only a contributor, not a core team member). You can see from the contribution guidelines[0]:

  Ebooks that are not clearly in the U.S. public domain. If it’s not on Gutenberg, we’ll probably decline it.
So we're basically piggybacking off of the copyright verification work we assume that PG has already done. This is one of the reasons I haven't started The Worm Ourboros yet--it's in Australia's Gutenberg archive, but not the US one.

[0]https://standardebooks.org/contribute/accepted-ebooks


Wait, so what do you do that Gutenberg doesn't?


Gutenberg does amazing work. Full stop.

With that said, their specialty is in the transcription part. If you try reading one of their public domain works on a Kindle, they're often full of formatting problems and typos, since the transcripts are sometimes sourced from OCR scans. I've known people who tried starting a free book from Gutenberg, but eventually gave up and bought the same e-book off Amazon for a dollar because it at least had a working TOC. That kind of sale saddens me greatly. The end-user thinks, "Ah, it was only a dollar, I got my money's worth," but the publisher has basically paid nothing for the work, adds a few hours of digital typesetting, and then makes 100% profit on the sale.

Standard Ebooks often uses the Gutenberg raw text as a starting point and then cleans it up. We have a set of tools used for the initial cleanup process[0] that handles pagination, TOC generation, and some other basic "modernization" steps. The texts are then proofread and edited to conform to our style guide[1], which aims for maximizing readability on modern e-reader devices, as well as adding semantic meaning to any text markup. You can look at the guide for producing such a book to get a better idea of the process.[2] The end result is a free, public domain work which looks and feels like a professional production.

[0] https://github.com/standardebooks/tools

[1] https://standardebooks.org/contribute/typography

[2] https://standardebooks.org/contribute/producing-an-ebook-ste...


Fantastic comment, thank you for your efforts (and the rest of the folks at standardebooks.org). I haven't looked, but I'd ask that whenever possible, final artifacts are also uploaded to the Internet Archive.


Seems like the website describes it pretty well:

https://standardebooks.org/


If it was first published in the US in 1922, as it appears, it is unquestionably in the public domain here.


A copy is available from HathiTrust: https://catalog.hathitrust.org/Record/102154371


Wow... is this authoritative? If true, this is worthy of an article in major news publications, no?

And if legally sound, then should we expect Google Books to make these 80% of books fully available, as all books from 1923 and prior already are?

Also... how are we only figuring this out now?


I've had good luck getting Google to mark books as public domain, at least if they are government publications. Here's what I do: I click on the "Report an issue" link at the bottom, select "I have a question or feedback about a book", and fill out the form, selecting "I’d like to see the entire book, and I believe the book is in the public domain" as the reason. Doesn't work every time, but I'd guess it has worked over 90% of the time for me. Might work for other reasons too. I don't know at the moment, but I'll give it a shot at the next opportunity I have.


As far as I can tell, Google Books has never tried to apply specific renewal information in determining what books can be shown in full. But HathiTrust gets the same books as Google Books and they do use such information. Try it if you find a book in this time range on Google Books that has limited visibility:

https://www.hathitrust.org/


Lots of TV shows before 1970 weren't renewed or had copyright notice errors, making them PD -- many of them on archive.org: https://infogalactic.com/info/List_of_TV_series_with_episode...


(Edit: I misread the original claim, my fault. So I'm fixing the text here.)

I think it's important to look at authoritative documents, especially since copyright duration is absurdly complicated. You can learn more about copyright duration in the US here: https://www.copyright.gov/help/faq/faq-duration.html

Circular 15a "Duration of Copyright" from the US Copyright Office ( https://www.copyright.gov/circs/circ15a.pdf ) says in heading "Automatic Renewal and Voluntary Registration" that:

* "Mandatory Renewal Works originally copyrighted between January 1, 1950, and December 31, 1963. Copyrights in their first 28-year term on January 1, 1978, still had to be renewed to be protected for the second term. If a valid renewal registration was made at the proper time, the second term will last for 67 years. However, if renewal registration for these works was not made within the statutory time limits, a copyright originally secured between 1950 and 1963 expired on December 31 of its 28th year, and protection was lost permanently."

* "Works originally copyrighted between January 1, 1964, and December 31, 1977. Congress amended the copyright law on June 26, 1992, to automatically renew the copyright in these works and to make renewal registration for them optional. Their copyright term is still divided between a 28-year original term and a 67-year renewal term, but a renewal registration is not required to secure the renewal copyright. The renewal vests on behalf of the appropriate renewal claimant upon renewal registration or, if there is no renewal registration, on December 31 of the 28th year."

So there is a cutoff before 1964. However, it's not at all clear to me how many times the copyright holders didn't renew their copyrights. I would be unsurprised if the big publishers did typically renew them. If a book was published in 1950 and it was renewed, then the copyright would continue until 2045 (1950+28+67).

I've only listed 2 cases (just before & just after 1964), but there are actually many more cases and it's all quite complicated.

I think copyright duration is grossly overlong, far in excess of what is needed to get people to create works.


I'm interpreting your quote as applying only to works first copyrighted after 1964. But, the poster is referring to works copyrighted before 1964.


> So there is a cutoff before 1964. However, it's not at all clear to me how many times the copyright holders didn't renew their copyrights. I would be unsurprised if the big publishers did typically renew them.

That doesn't sound that surprising to me either, but I don't think that matters for the claim that most are PD now. My intuition would be that big publishers wouldn't be responsible for a large percentage of the published titles in that time period. Certainly, from the standpoint of units sold they would be, but not distinct works.

There's a lot more niche, technical, academic, low-end, etc works even though most people consume the mass-market popular stuff. I would expect that all those long-tail things to be the ones that never got renewed, and to be in the majority as far as titles go.


This affects earlier versions of the Alcoholics Anonymous "Big Book", and I'd heard it was a sore point for them.

http://silkworth.net/gsowatch/1939/uslaw.htm


Can anybody tell me if the same laws cover illustrations in old books that cover the words?


As far as I know the copyright terms for illustrations are the same as the ones for the written word.

However, you could theoretically have a situation where the artist renewed the copyright on the images but the author did not renew the copyright on the words, or vice versa.


Why is there no netflix for books? Google already scanned most of the world's books. The only thing that would be scary is someone knowing what books I read when and where. (my usual privacy concerns ...)


There is. It is called the library. My local library has plenty of ebooks and audio books that I can borrow through both their website and their partners' websites like overdrive.com. And it doesn't even even cost you a monthly fee because it is already paid for by your tax dollars whether you use it or not.


Though there are often long lines to wait for many books, because the licensing model is dumb.


There already is. Amazon has had Kindle Unlimited ($9.99/month) for quite some time for now.


And of course there's O'Reilly's Safari subscription for programming/tech books...


Archive.org has digital borrowing concept. I don't like digital renting though. If you are paying, better just to buy DRM-free and keep your copy.




I would love to have all the winners of the Hugo Awards(the first were given in 1953)


This is really awesome and I wasn't aware of it! For a lot of the pulp sci fi stuff on Project Gutenberg, it was just a handful of volunteers doing the copyright research and I think it was a big bottleneck. I remember thinking of getting involved in that research but it was far too complicated and time consuming. Making it machine-readable is a huge step to opening up the public domain!


> This is how Project Gutenberg is able to publish all these science fiction stories from the 50s and 60s.

Interesting. I was wondering how Project Gutenberg put Robert Sheckley's stories on-line (and why only some and not all of them): https://www.gutenberg.org/ebooks/author/2960

It also highlights again, how crazy copyright term has become, and that it needs a serious rollback to sane levels.


> This only represents 10% of the 80%, but it's the ten percent most likely to be interesting,

Does anyone have any insight into why the chosen 10% is the most likely to be interesting?


.. for some value of "most"


The article says 80%


I would also keep mind Sturgeon's Law ("90% of everything is crud", first published 1957 in his book review column, interestingly enough).

Most books are crap. Interchangeable crap; maybe you'd enjoy some of them or learn something, but you could pick at random from the interchangeable pile of crap and be equally entertained and educated.

If you trawled that 80% of pre-1964 books, most of it would be crap, simply because most books are crap, whether they were published in 1964, 2014, or 1864. And more of it would be crap, because if you are differentially renewing copyrights on books in your catalogue, you are trying to renew the good stuff and forget the crap.

That's not to say that good books didn't fall out of copyright, for a million reasons, but if you have five books you like from before 1964, and you go check the copyright status, don't be surprised if fewer than 4 are part of that 80%.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: