How Badly Is Google Books Search Broken, and Why?

ppod · on Feb 18, 2019

Google tried to make this work, but they were sued; and then they made a deal, and then many, many people objected to the resulting deal. This is the usual process whereby a corporation is first criticised for having too much power, and then when they relinquish power they are criticised for not doing enough.

https://www.theatlantic.com/technology/archive/2017/04/the-t...

The end of that article is a not-so-subtle plea for someone within google to perhaps accidentally anonymously place this material in public.

zeveb · on Feb 18, 2019

It's fascinating to read that article:

> Page wanted to know how long it would take to scan more than a hundred-million books, so he started with one that was lying around. Using the metronome to keep a steady pace, he and Mayer paged through the book cover-to-cover. It took them 40 minutes.

> Michigan told Page that at the current pace, digitizing their entire collection—7 million volumes—was going to take about a thousand years. Page, who’d by now given the problem some thought, replied that he thought Google could do it in six.

> In just over a decade, after making deals with Michigan, Harvard, Stanford, Oxford, the New York Public Library, and dozens of other library systems, the company, outpacing Page’s prediction, had scanned about 25 million books. It cost them an estimated $400 million. It was a feat not just of technology but of logistics.

> At its peak, the project involved about 50 full-time software engineers. They developed optical character-recognition software for turning raw images into text; they wrote de-warping and color-correction and contrast-adjustment routines to make the images easier to process; they developed algorithms to detect illustrations and diagrams in books, to extract page numbers, to turn footnotes into real citations, and, per Brin and Page’s early research, to rank books by relevance.

Doesn't that take you back to an optimistic time when Google was exciting, and we thought that it could do amazing amounts of good for the world? I miss that era.

icelancer · on Feb 19, 2019

Yeah, it does. It's also not wholly Google's fault, obviously, since ppod clearly illustrated why. No good deed goes unsued.

jacobolus · on Feb 19, 2019

I worked for the Harvard library for a while (as a part time student job) examining the quality of a lot of these Google scans in ~2005.

I’m frankly not very impressed by Google’s scanning process, OCR, image detection, de-warping, contrast adjustment, general QA/QC, etc.

Quality is variable, peak quality is mediocre, and the results are largely useless for anything but text.

Like a lot of other parts of Google, it’s a case where they tried to cheap out on trained human labor, and make it up with algorithms.

On the upside, even a mediocre book scan is better than nothing.

dtagames · on Feb 18, 2019

Exactly my point.

The "deal" resulted in anyone being able to download the full books if they go to a university library which is a partner of Hathi Trust. Of course, after that download, you have the full PDF and you can do whatever you want!

This isn't protection for copyright holders. (Hathi Trust got its books from Google Books scans and doesn't pay copyright holders.) This "deal" isn't helping anyone and it hurts researchers.

ahi · on Feb 18, 2019

HathiTrust does not provide access to in copyright materials to anyone. They provide research access for "non-consumptive" use through the HathiTrust Research Center.

dtagames · on Feb 19, 2019

That's not correct. First of all, all books are copyright by their authors.

Anyone can go to a partner library (there's a list on the login page of the Hathi site) and download those books as PDFs. Try it! I've done it many times.

ahi · on Feb 19, 2019

These pages provide a decent overview of access restrictions for HathiTrust materials: https://www.hathitrust.org/features_benefits https://www.hathitrust.org/out-of-print-brittle

wumpus · on Feb 18, 2019

The "deal" (which failed, btw) was a crappy one. The only solution is legislation, not a "deal" made with the Author's Guild that would give Google and only Google special status.

dtagames · on Feb 18, 2019

Here's another little-known fact: Every American is entitled to a library card from the Library of Congress (you have to go there in person to get one).

The Library of Congress is a Hathi Trust partner! So if you go get that card, you can download all of the out-of-print books that Google scanned on your own computer. No copyright holders are getting paid (and no one is being harmed), so why all these barriers in-between?

lstamour · on Feb 18, 2019

I can confirm that the Library of Congress Reader card is all it takes to login. And you don’t have to be a US Citizen to get one, but you do need a passport or other US-recognized identification to present and validate in-person in D.C. And you have to do a bit of research on how and where to get it, they don’t just hand them out at the front desk as other libraries might. The Library of Congress uses the card to distinguish researchers from one-off tourists and so while the card is easy to get, they have just enough process in place that it’s clear it’s not a souvenir and you have to traverse a maze of hallways to get it. (Or you had to when I did, at least...) But once you have it, just login online and you’ll have access to Hathi Trust here.

sct202 · on Feb 18, 2019

It's too bad you can't just plug-in a license, passport number or something online and get a virtual card. Seems like such as wasted opportunity to expand access to resources for pretty much free.

amyjess · on Feb 18, 2019

Thank you and mimixco so much for pointing this out.

If I ever end up visiting DC again, I'm definitely going to do this. I have a new bucket list item!

lstamour · on Feb 20, 2019

Don’t forget to check out some of the amazing reading rooms while there! :) I’d love to go back!

chii · on Feb 19, 2019

If you don't need to be a US citizen, does that mean I can come as a tourist, and (even though it's convoluted) I can get the card using my passport?

lstamour · on Feb 20, 2019

Yep. I’m a Canadian, got one while visiting family in the US. I did at the time work for the Toronto Public Library, but that wasn’t a consideration for them. :)

btrettel · on Feb 18, 2019

Thanks for pointing this out. I'm graduating relatively soon and was disappointed that I couldn't download books after that. Just tried my Library of Congress login and everything worked great.

Note that a Library of Congress card expires every 2 years as I recall.

wyclif · on Feb 19, 2019

Do you need to return to DC to renew it every two years?

btrettel · on Feb 19, 2019

Yes, as I recall, someone I know was not able to log in when their card expired. They had to get a new card in person.

maxlybbert · on Feb 18, 2019

I understood that there is an exemption in US law just for libraries ( https://copycense.com/2012/03/05/section-108-fair-use-are-no... ), but it wasn’t clear whether Google was covered when acting as an agent for the libraries.

But it seems to be the reason to require the rigamarole with library cards.

userbinator · on Feb 18, 2019

So if you go get that card, you can download all of the out-of-print books that Google scanned on your own computer.

Do you have to actually go there in person too, or do you just get some sort of credentials (which no doubt some people would have already shared...)?

dtagames · on Feb 19, 2019

Once you have the Library of Congress card, it is your credential to login from any computer and use Hathi Trust with full downloads.

sodosopa · on Feb 18, 2019

No copyright holders being paid is theft. Authors write and should be paid for their efforts. This is glorified and legalized privacy.

dtagames · on Feb 18, 2019

No, we're talking about out-of-print books. There is no one around to claim copyright or payment.

No one is trying to download Harry Potter from Google Books, lol!

Fnoord · on Feb 18, 2019

> No copyright holders being paid is theft

No, it isn't. It is copyright infringement. Theft is something completely different. Different laws apply.

sodosopa · on Feb 18, 2019

If you take money out of an author/musician/artist’s hand for a work they created. You can call it anything you desire but it’s theft and piracy.

Fnoord · on Feb 18, 2019

So if I decide not to take the bus its theft as well? If I decide to steal a bicycle, its theft from the bus company?

wumpus · on Feb 18, 2019

If I visit an ad-supported website and fail to look at the ads, is that theft?

darkpuma · on Feb 18, 2019

Incredible as it may seem, I've actually seen that asserted. Years ago when I was a kid, before we had internet, my family would get a lot of magazines. Whoever picked up the mail, usually whoever came home most recently after it was delivered, would flip through every magazine and tear out all the advertisements and throw them in the trash before putting the magazine on the counter for the family to read. I wonder, was that 'theft'? We also used to change the channel or mute the television/radio whenever advertisements came on. Was that 'theft'?

Usually people who call adblocking theft start squirming when these pre-digital examples of ad avoidance are put to them.

EliRivers · on Feb 18, 2019

You're conflating literal theft with metaphorical theft. One of them is theft; the other isn't.

vkou · on Feb 18, 2019

Unless that author, musician, or artist operated in a vacuum, their work is a direct derivative of the work of other authors, musicians, and artists.

Every dollar a musician earns I'd, therefore, a dollar they take out of the hand of the musicians, whose work they based theirs off of.

This is why the public domain exists. You can make derivative works without paying anyone... But your work will fall into the public domain, so that you pass this benefit on to the next generation.

johnisgood · on Feb 19, 2019

But you don't take money out of their hand, because there is no money to take out to begin with. Or did you mean imaginary money? If you did, well, I could come up with many different ways as to how you are doing the exact same thing to virtually anyone. :P

Anyways, you can't physically remove and deprive an owner of an idea. It doesn't fit the definition of theft at all.

wumpus · on Feb 18, 2019

Did you know that some people consider libraries lending physical books to be theft?

jandrese · on Feb 18, 2019

If the public library system weren't grandfathered in it would never exist today. Too many people want their cut and nobody is willing to make a stand for the public good.

ocdtrekkie · on Feb 18, 2019

This is one of the reasons it's so crucial to support and fund libraries. There's no way we'll regain this amazing benefit to society if we lose it.

dtagames · on Feb 18, 2019

Well, they're crazy. If libraries didn't lend physical books, academic research would be set back 200 years.

jandrese · on Feb 18, 2019

Yeah, but think of all of the money that could be made by the publishers.

darkpuma · on Feb 18, 2019

I think maybe that's his point. There are some crazy people in this world, so no matter how reasonable anything is there will always be somebody who gets frothing mad over it for no rational reason.

wumpus · on Feb 18, 2019

I was more making the point that it's reasonable for copyright owners to not get paid extra in every circumstance.

But we're not going to have a nice discussion about it after an accusation of theft and piracy.

HNLurker2 · on Feb 19, 2019

Doesn't in Europe all research papers be made free by 2020:https://www.theguardian.com/science/2016/may/28/eu-ministers...

Something Aron Schwartz envisioned?

sodosopa · on Feb 18, 2019

And that’s stupid.

ChuckMcM · on Feb 18, 2019

It isn't surprising, this bit is sad:

Google Books has failed to live up to its promise as the company has moved away from its original mission of organizing information for people.

Google was only about organizing all the worlds information while search ads was an unlimited fountain of money. As Google's ability to generate money with search ads has dwindled, their more grand (and not monetizable) projects have been either starved for resources or outright killed.

Sure the lawsuit was a pain. And book publishers are turds for arguing that they still have rights to books that they won't publish ever again. But the courts found that there was nothing wrong with Google having the information[1]. That trove of text could be the worlds greatest source of knowledge but as we all know, people using internet search for work never click on ads and not enough of them are willing to pay a subscription service price to cover the cost of infrastructure. Google hoped that at one time they would make money by printing on demand those books that were out of print but people wanted, but that was shot down by short sighted publishers and agents. Perhaps it will be taken up by Amazon which has the resources to do it.

[1] https://arstechnica.com/tech-policy/2015/10/appeals-court-ru...

porpoisely · on Feb 18, 2019

It is sad. But it's not just google books, it's google search, google news, youtube, etc.

There was a time when all of google's properties catered to the users. Their search engine was the best. Google news was the best aggregate site. Youtube recommends used to be amazing to the point you could spend hours following their recommends.

Now google search, google news, youtube, etc are all garbage. It doesn't serve the people. It serves corporate interests. You can thank media companies and the elites who pressured them for that.

ChuckMcM · on Feb 18, 2019

No need to cast aspersions on 'elites' and 'corporate interests', what is pressuring them is that the ratio of the amount of money coming in to that going out, has to be maintained at a certain level for Google to remain Google. They really have only two choices there, either sell more ads (generally means putting ads on more things, or coming up with new ways to charge for new things like being in the 'shopping' box on product searches) or cut costs which means shutting down projects, reducing staff, Etc. Depending on how you look at it, Google gets something like one 500th of what they used to get for an ad on their web site.

porpoisely · on Feb 18, 2019

Am I cast aspersions or just stating facts? Google changed because of pressure from the elites and corporate interests who used the media to badger them. It certainly isn't in google's interest to make their product worse purely to benefit others.

ChuckMcM · on Feb 18, 2019

> Am I cast aspersions or just stating facts?

Your previous comment was attributing without evidence, actions of malice by descriptive but undefined third parties. That is the definition of "casting aspersions."

"Stating facts" would start with something like, "See this evidence that Google's policies were changed by <corporate entity> or <person or persons>."

Since you are doing the former, and not the latter, I conclude that the answer to your question is that yes, you are casting aspersions.

porpoisely · on Feb 19, 2019

I'd assume you'd already know that media and elite pressure is why google changed since most people here work in the tech industry. Are you new to HN or do you work in a non-tech industry?

This reporter claims that she got youtube to change it's search list.

https://twitter.com/aprilaser/status/1076215375732174848

Should we believe her or is she lying?

"Google follows Facebook's lead and removes 39 YouTube channels linked to Iran"

https://finance.yahoo.com/news/google-follows-facebook-apos-...

These channels had been up for many years. Why do you think all of a sudden google decided to remove them?

Certainly it wasn't corporate, media or elite's pressure. So then who? Aliens? When chinese or russian social media companies remove and change their policies, why do you think that is? Aliens as well?

After 10 years of spectacular success of youtube being "you"tube, why did it suddenly become "corporate"tube? Why did they change their recommends, trending, etc? Must be aliens. It can't possibly be the elites and the media constantly attacking it?

"Facebook and Google are doomed, George Soros says"

https://www.washingtonpost.com/news/the-switch/wp/2018/01/26...

johnisgood · on Feb 19, 2019

Yeah, in the beginning it was to gather lots and lots of people, which you can only do by being useful or living up to your claims, and as it gathered attention from Governments and so on, it had to comply with their regulations, effectively making it less and less useful. I remember the days when I could find any PDFs back then! Those days are over.

gwern · on Feb 18, 2019

> If I worked at Google, I would have implemented a text-based date-prediction algorithm to flag erroneously classified books. (I have actually done this and sent a list to the HathiTurst of books they may have erroneously released into the public domain. It works).

With friends like these, who needs enemies?

philipkglass · on Feb 18, 2019

If you're using HathiTrust seriously and aren't affiliated with a partner library, consider Hathi Download Helper to get complete public domain books archived to local storage. I wrote an earlier command-line version of the tool. Someone else built a GUI and put in the work to keep up with the evolving API.

I often use Google to locate a book, then check Internet Archive and HathiTrust if it's old enough that it should be public domain under US law. I really appreciate HathiTrust putting in the effort to check copyright renewals and make more of their materials fully visible. I don't appreciate the technical barriers to downloads that they erect, but that's out of the hands of the developers working there. As long as their web viewer shows individual pages you can be sure there will be a way to reassemble full books.

dtagames · on Feb 19, 2019

Thanks for that tip and for writing the software! I had to play that game of manually assembling PDF pages for an old magazine article lately.

robin_reala · on Feb 18, 2019

Hey, thanks for this. The Hathi interface is slow to say the least, so it ends up being my fallback to archive.org more often that not.

dtagames · on Feb 19, 2019

Once I needed a copy of The Congressional Record from the 40's and Google had marked it non-public! I had to make several calls to Hathi to explain that the Record is public by law and then they had to contact Google to get the restriction lifted.

It's a lot of rigamarole for information that researchers need.

babalulu · on Feb 18, 2019

Google Books has issues of Billboard magazine dating back to 1942. It used to be valuable for research, but it's become much less so over the years. Currently, search results that return actual magazine issues are limited to the first page. After that, it's just normal Google links. Even searching for something like Elvis or Glenn Miller, both of whom should have been in a crapload of issues, returns only one page of relevant results.

Trying to search by date is very hard. Limiting search to "Glenn Miller October 1942" might return one or two relevant results, or it might not return any. Trying to search by issue date doesn't work at all.

They have an index of Billboard issues which allows you to go to individual issues and read them, but the index stops at 50 pages, and for a weekly magazine, that limits the index to only a handful of years. Using the index, you can't go directly to issues before the 1980s, and with search by issue date useless, that means you're just out of luck if you want to see a particular issue in the 1970s.

rasz · on Feb 18, 2019

They do seem to be crippling book search on purpose. Just yesterday I was looking for "PC Mag 1997 january Pentium MMX" and Google refused to return PC Mag 7 Jan 1997 issue results, whats even more weird clicking "browse all issues" returns

    The requested URL /books/serial/ISSN:08888507?rview=1 was not found on this server

but "About this magazine" will happily give you a list of all scanned issues :o and opening january one will let you search it and will return positive results.

EliRivers · on Feb 18, 2019

I completed a Masters in Mathematics back in 2014, involving a lot of historical research into the development of geometry. At the time, the ease with which I could open up books written over a century ago, search them and read them, was a fantastically useful tool.

I have some of those sources still on my hard drive in their scanned PDF format. They've now effectively vanished from the open internet. So much, available for such a short window. Our children will never believe us when we tell them what was once right there at our fingertips, and those that do should never forgive us.

toomuchtodo · on Feb 19, 2019

Consider uploading them to the Internet Archive.

wumpus · on Feb 18, 2019

Here's an example of an exploration tool built using the content of books: https://books.archivelab.org/dateviz/

It would be nice if anyone could build such tools, but all of that data is locked up inside of places like Google Books and Hathi Trust. Google isn't even interested in making their metadata available, other than by running searches.

oasisbob · on Feb 18, 2019

This breakage reminds me of similar constrained-search problems with the Google Groups/ Dejanews USENET archive. Once upon a time it was nice to research with.

darkpuma · on Feb 18, 2019

It reminds me of the demise of google's code search and github's woefully inadequate code search. Debian's Code Search is okay, but github allowing it would be great.

jetrink · on Feb 18, 2019

Searching within books is broken too. If you search for 'cat', any instance of the word 'cats' will also be returned (and any other word beginning with 'cat'), but with the message, "No preview is available." No link to the page of the result is provided. It seems like the kind of bug that should be straightforward to locate and fix, but it has been this way for years. (My guess is that the tool that builds previews doesn't recognize partial matches.)

dtagames · on Feb 18, 2019

It's not a bug. It's on purpose. They actively limit previews, full page views, and downloads -- even though you can go to a university and download the same book from Hathi Trust.

Hathi Trust isn't paying the copyright holders, either, so who cares?

dtagames · on Feb 18, 2019

Based on my experience with research using Google Books and Hathi Trust, this author is correct. Google has purposely broken Google Books so that it doesn't compete with any paid sources for these materials -- even if there aren't any paid sources and the book is out of print!

The TL;DR is that Google Books started out with the goal of digitizing every book ever written. Publishers sued, so they crippled their search and display functions and handed over the full texts they already had to a group called Hathi Trust.

Hathi Trust is seriously crippled on purpose. It only allows access to full texts when you are sitting in a physical library of one its partner universities. That's right... I can drive to a big university, sit in their library, and download a full PDF of any book I like. But if I'm at my house, I can only read one page at a time in a browser. This is ridiculous. Hathi Trust is helping the oil business more than they're helping researchers.

The marriage of Google Books content and Hathi Trust as a distribution platform is a joke. In some cases, you will even have to order an old book from interlibrary loan (see worldcat.org) if you can even get it -- when all the while Google has a scanned copy!

kevin_thibedeau · on Feb 18, 2019

If you were an author of a book with active copyright you might want to get paid for it.

Spooky23 · on Feb 18, 2019

How?

My grandfather wrote a book in the 1940s that’s been out of print since the mid 50s. Every entity associated with the book is dead, including the publisher, which merged into another in the late 50s and is probably an inactive imprint of some successor company. Grandpa died in 1985, and a cousin or I is likely the heir to his rights.

I have a copy of the book, which I bought via Alibris from a bookstore in Wales 15 years ago. If you needed the book for research, you’d probably get it via inter-library loan from a university or a big city library. Whomever the publisher is, they don’t have it and aren’t selling it. In no scenario does anyone get paid for transacting, other than a reseller or the post office.

dwyerm · on Feb 18, 2019

Stay with me for a second; I'm going to go wildly afield... does intellectual property need an Adverse Possession law, too? [1]

Your grandfather had a book, the rights of which should have been presumably passed down to you. My grandfather had a patented mining claim that has been passed down to me. Where the ownership of your grandfather's book is questionable, for me the physical corners of the property are questionable. A number of them are defined by things like a "4 foot spruce post" or a "12 inch diameter tree trunk" that haven't survived the ravages of time.

But it is important that I patrol my property at least every couple of years because of Adverse Possession. If someone else were to use my property continuously and I don't say anything against it, one day their trespassing suddenly and magically would become ownership. For a land-owner, it is a scary idea that someone can steal my property from me, as has actually happened. [2]

But I can acknowledge that it makes some sense. It comes from the idea that land is meant to be used, and if you aren't using it, maybe the person who is using it should get the rights.

If nobody can stand up for an intellectual property claim, perhaps some kind of adverse possession is in order.

[1] https://en.wikipedia.org/wiki/Adverse_possession [2] http://articles.latimes.com/2007/dec/03/nation/na-land3

gwern · on Feb 18, 2019

We actually do have a sort of 'adverse possession' in US copyright law, for libraries: the "Last 20" clause, which IA has recently begun exploiting for distributing still-copyrighted books (https://blog.archive.org/2017/10/10/books-from-1923-to-1941-...).

mikez0r · on Feb 18, 2019

Counter-argument: Adverse possession is justified by the scarcity of real property. Which does not apply to IP.

- Real property (i.e. land) is a scarce and limited resource. If a party is making productive use of the land, they should hold title. (There is only so much arable land. If someone raises crops, let them.)

- Intellectual property (particularly copyright) is not a scarce or limited resource. (Create your own copyrightable work if you wish to own the rights.)

mikez0r · on Feb 18, 2019

Of course, adverse possession presumes that land ought to be made "useful." (Haven't thought yet about how these critiques of real property theory map to IP.)

Much of real property theory arose from the assumption that the government should recognize and encourage the "highest and best use" of real property. Traditionally, the highest and best use of land is the use that can most profit from the land's resources; often mining, grazing, farming, logging.

This is problematic.

- This view justifies colonization, and taking land from original inhabitants who don't use the land to extract resource value.

- This view does not recognize preservation of an ecosystem as a valuable use.

- This view does not account for externalities from use of the land's resources.

robkop · on Feb 19, 2019

This is a very interesting argument .

I think it's worth noting that you can calculate an estimate of the externalities and remove that from the profit to achieve a more balanced justification. Though unfortunately, unless you counted the loss of culture as an externality then you could still trivially justify the removal of land from those less productive/ technologically advanced than you.

Furthermore, even though I'm not personally supportive of the removal of land at the individual's loss I do have to ask if the removal could account for a net gain overall; improving many people's lives. Perhaps profit isn't the best measure of improvement to the collective but it is at least indicative.

TeMPOraL · on Feb 18, 2019

> - Intellectual property (particularly copyright) is not a scarce or limited resource. (Create your own copyrightable work if you wish to own the rights.)

It isn't scarce for those seeking rent from it; it absolutely is for those seeking to use the works under copyright. In case of books, music, movies, games, etc., the works are not substitute goods. If I need a particular book for my research, there's a good chance I can't just take a different book instead. So there is acute scarcity involved for a subset of parties interested in a copyrighted work.

function_seven · on Feb 18, 2019

I like the analogy between real estate and copyright.

I think a simpler* solution would be to return to limited terms on copyright (say, 14 years), and require periodic renewal by the rights-holder to extend that term. As part of the renewal process, you'd either need to demonstrate active use of the copyright, or pay a fee (or both?).

* Simpler from a process point of view. I understand it's probably not simple politically.

TeMPOraL · on Feb 18, 2019

Exponentially increasing fee, set in a way that past two-three such renewals, even the biggest and richest corporations would think twice before paying up.

dtagames · on Feb 18, 2019

Right... And since Google Books probably has a scan, who cares if you download the PDF?

This problem is impeding real research without helping anyone.

wumpus · on Feb 18, 2019

That's a classic "orphan copyright" situation.

ghaff · on Feb 18, 2019

Right. And orphan works legislation has tended to be opposed by individual creators or at least the organizations that purport to represent them. The (not totally unreasonable) theory is that individuals or their estates--let's leave aside the fact that copyright terms are almost certainly too long--may well inadvertently not keep on top of what's needed to keep works non-orphaned. But "Disney" (or whoever) will most certainly be ready to pounce on anything they can acquire for free for whatever technical reason.

dtagames · on Feb 18, 2019

As a matter of fact, I am.

I'm speaking here of old, out-of-print books, where this is very much a problem with Google Books and Hathi Trust.

jandrese · on Feb 18, 2019

It's also a fundamental problem with our current copyright law where material remains in copyright for many decades after it has stopped being printed and sold through primary channels.

Copyright should automatically expire 10 or 15 years after the last printing IMHO. If nobody cares enough to put it up for sale or even make a tiny print run just to renew the copyright, why should the government continue to enforce it?

Of course this scheme falls apart a bit in the digital age, except that even ebooks get pulled from the shelves for no apparent reason. Maybe we should just go back to having to explicitly renew copyright after 15 years or so, with a fee just large enough to convince people to drop dormant works. Maybe a couple hundred bucks every 5 years.

The best part would be having some easily accessed online system where you could check the copyright status of any work, including current contact information for the rightsholders if you want to arrange payment.

izacus · on Feb 18, 2019

Expiring copyright after 15 years no matter the printing might even be a better solution. That would allow other people to build and extend works, movies and software.

15 years is plenty to make a mountain of cash.

TeMPOraL · on Feb 18, 2019

Yup. The way I see it, a lot of IP problems are caused by people who want to take the law intended for promoting new works, and turn it into a source of passive income.

jandrese · on Feb 19, 2019

But people wouldn't be encouraged to make original works if their great grandchildren would be robbed of the opportunity to squabble over the rights to it in the courts.

dredmorbius · on Feb 18, 2019

Hathi will not provide access to even out-of-copyright works.

LibGen has no such problems.

dtagames · on Feb 18, 2019

Here's an easy example... Google Books has many old issues of Popular Mechanics which exist in full downloadable form on the Internet Archive -- yet you can only see individual pages on Google Books and can't download the magazine.

This is because Google Books is acting like they own the copyright (or, at least, they feel the need to police it.)

There are many cases where you can download the entire book from Hathi Trust when you are sitting at a university library, giving you a PDF you can use anywhere. But you cannot even see the entire book or download it from Google Books (which has given its scan to Hathi). This is just stupid.

drallison · on Feb 18, 2019

Google Books seems to have major problems. An alternative interested parties should explore is Archive.org. The Internet Archive has a significant collection of scanned books and other materials.

phonon · on Feb 18, 2019

Using https://babel.hathitrust.org/cgi/ls?a=srchls&anyall1=all&q1=...

I get 14,515 results, with 3,115 of them full view.

There is also https://analytics.hathitrust.org/

which seems interesting!

dtagames · on Feb 18, 2019

And all of them are full view if you are physically sitting in a Hathi Trust "partner" university library. These libraries are open to the public and allow downloading and saving of the materials you browse, making the whole point of locking them up completely pointless.

elektor · on Feb 18, 2019

Is it then possible for an Aaron Schwartz/ Sci Hub character to then download all of the available books and make them available on the internet?

sodosopa · on Feb 18, 2019

yes and that’s theft.

ahi · on Feb 19, 2019

This is not correct. "Partner" access does not provide additional access to in copyright materials. They do have more convenient download options for public domain works and additional services.

toomuchtodo · on Feb 19, 2019

Mind emailing me? You don’t have contact info in your profile. I’d be interested in hanging at a library for a few hours with my laptop and have some questions before going.

dtagames · on Feb 19, 2019

To whom is this directed? If it's to me, just write your question here. I've pretty much already explained how it works. Go to the Hathi Trust login screen. Pull down the list of partner institutions. Go to one of them in person. (or get a Library of Congress card, also in person.)

After the institution logs in, you can download anything you want. Bring a flash drive or portable hard disk and take home your PDFs.

I'd recommend calling the partner library first to make sure someone there knows the Hathi login. It's not that popular a resource, sadly, and many people have no idea what it is. You may also find a friendly librarian who is willing to do the download for you and email you the PDF, saving an in-person visit.

One thing to note: You don't have to be a student to use the resources of university libraries. They're open to everyone.

toomuchtodo · on Feb 19, 2019

Thank you!

ccleve · on Feb 18, 2019

I wonder if the date problem is a bug, not a feature?

Is it possible to dump the metadata of a book and check if they have the right date? There should probably be multiple dates for a book -- date written, date copyrighted, date published, date of latest edition, etc.

My guess is that Google does not have a publicly-available issue tracker for Google Books so you can't easily report this problem. Hacker News is a good way to get their attention, though...

amelius · on Feb 18, 2019

What would be a scientific approach to compare search results? Let's say I do a search on DDG and on Google, how do I determine which engine provided the most accurate results?

brownbat · on Feb 19, 2019

I think if you have a population of people doing the same search, split randomly across the two sites, things like how long it takes to leave your site through a search result, how far down that result is, and how often people come back to rephrase their search are all good metrics.

Not sure there's an answer for a single search for a single individual though.

nottorp · on Feb 18, 2019

I thought google search (without "books") is broken... why is the books search being broken a surprise?

minikites · on Feb 18, 2019

[flagged]

TeMPOraL · on Feb 18, 2019

Do you have a spreadsheet mapping HN user handles to their positions on taxation and UBI?

HN has a lot of users. There's often a pretty low overlap of commenters between different submissions.

minikites · on Feb 19, 2019

My comments seem to strike a chord with a lot of HN users, I've had more than one person go through my comment history and reply to days or weeks old comments of mine.