Hacker News new | past | comments | ask | show | jobs | submit login
Standard Ebooks: Free public-domain ebooks, carefully produced (standardebooks.org)
820 points by BerislavLopac on Aug 2, 2019 | hide | past | favorite | 129 comments



> Other free ebooks don’t put much effort into professional-quality typography: they use "straight" quotes instead of “curly” quotes, they ignore details like em- and en-dashes, and they look more like early-90’s web pages instead of actual books.

True. I hope you guys get proper funding and keep this project on.

Contribute: https://standardebooks.org/contribute/

(I was thinking a Slack or Discord would be better than Google groups mailing list for this?)


Mailing lists are superior for async communications IMHO for endeavors such as this. Nothing needs to be addressed immediately (and as everyone is a volunteer, realistic expectations should be set for response latency; email helps that, Slack/Discord does not), and the mailing list archive is a natural log of conversations and decisions that are open and accessible (Free Slack only keeps 10k lines of conversation history if I recall). A mailing list is also free (can be, not always, but can be), and does not require a chat client installed.


Mailing lists was superior in 90's-00's, now when discourse/slack/discord/etc exists there's no reason to use ML except nostalgia. Parsing tons of new emails isn't easy.

Also I'm prefer to avoid Google services 'cause privacy issues.


I soured on discord/slack/etc when their absurdly bad performance caused my laptop to get so hot it probably neutered me.

Seriously though, those services are fine on a powerful tower PC plugged into the wall, but if you're on the move on battery power, they are unbearable.


Try https://cancel.fm/ripcord/, if you want to give them a try.


I like the combo of regular forum (Xenforo, Discourse, etc) and chat (Discord, Slack, etc). Unlike mailing lists, modern forums are actually usable, fun to use, and appealing. And chat provides a place for more conversational community-building.

For example, the Elm community has both. The Discourse forum is technical and business-only yet there's a clean record of these discussions. The Slack chat is where I hang out, get to know people, and participate in more relaxed chit chat about Elm, webdev, and building applications.

Elm used to just have a mailing list but it was obsoleted and shut down with the creation of the Slack group and Discourse forum which were far more popular.


Well discourse forum and matrix chat bridged to IRC through riot.im works very effectively in our open source community.


What do you mean when you say "modern forums are fun to use"? I ask in 100% good faith and I am not being snarky.


They have all sorts of modern features more conducive to discussion and community-building like notifications that someone @mention/replied to you and even editing your post -- features that people generally like. If you don't think that's "fun", fair enough, but I also enumerated other benefits like their broader appeal.

Any community that only has a mailing list could benefit from experimenting with a proper forum. I've seen this experiment broaden a community time and time again as you move away from only selecting for the type of person who likes mailing lists. And notice that HN isn't a mailing list either.

For example, I would imagine that the sort of people interested in high-quality ebooks extend beyond mailing list loving super-techies. Even a subreddit would be a nice option.


My question was simply about how much fun you experience using modern forums. I have never edited a post and thought it was fun.

Some people might say it's useful and others might say it encourages people to comment first and possibly focus on polishing the comment second.


For start... they aren't a freaking mailing list?

E-mail sucks.


It is bearable when using an efficient client like Ripcord.


You don't know how much I agree. I used Ripcord the other day and forgot software could just be fast. Slack feels like a boulder in comparison.

It's not very full-featured yet, but it's so fantastic I want to pay for it. It's a shame the companies themselves don't offer clients like it.


Counterpoint: Old mailing list conversations are difficult to parse and encourage a "ignore it until the issue goes away" mentality if no one is enforcing a reply rate.

Mailing lists only really work for corporations imo


We're talking open source/free/non-profits here. No reply rate should be enforced unless by project owners (their time, their project, their rules). Some issues should be ignored until they go away. I myself ignore issues from some folks who engage me in my role as an open source tooling maintainer, after I have exhausted my patience working with them and they are not receptive to polite discussion.

> Mailing lists only really work for corporations imo

https://en.wikipedia.org/wiki/Linux_kernel_mailing_list

We've steered off topic though. Feel free to email me if you want to chat further on the topic.


Fascinating subject. It seems like the difference between slack/discord and email, is the difference between a water cooler conversation and an actual sit down meeting.


There's also IRC for water cooler conversations in an open protocol.


> Feel free to email me if you want to chat further on the topic.

Is there any way we can contact you on slack or discord instead?


Why would you force people to use nonfree proprietary software in order to contribute to an open project?


That's a good point. I find Zulip better than Slack/Discord for discourse (even better than mailing lists, with some caveats, and it's Apache-licensed:

https://zulipchat.com/


I knew about Zulip but wasn’t aware of the free hosted plan. I can truly see it now as an alternative for orgs who can’t afford running these things themselves.


I use the free hosted plan and intend to pay for it when I need more features. It's excellent.


IRC?


Free as in "Google"?


With mailing lists, users can use any mail client regardless of server side.


?


Guessing the implication being that Google is ultimately not "free" w/ respect to your personal data and how it is used.


Yes but how does Google enter into this?


It doesn't directly, only as a meme, or archetype. The same way that beer is not directly related to the concept of gratis, but you still use it in the illustrative phrase "free as in beer".


The mailing list is provided by Google Groups


Help! Help! I’m being forced to use non free proprietary software!

Can just have both.


I was hoping I could contribute financially, which could help fund any software or hosting costs they have, but it doesn't look like they're accepting donations (which is also fine, and completely their prerogative).

But! If someone from SE is reading this and it turns out that you just don't have a way to donate because it doesn't seem like people will donate, definitely put a paypal button or something out there. :)


We have minimal hosting costs (ebooks are small) and no software costs, so the rest is just down to time. Luckily, the majority of the process is proof reading, which it turns out people quite enjoy doing regardless, and is easily parallelisable our across multiple contributors.

So so far not need for contributions, and it makes things simpler to not need them.


If your hosting costs ever mount, I can recommend Hetzner for hosting, they'll give you a whole bunch of bandwidth for free on their smallest plan ($3/mo) and you can even buy a 3 Tb pipe (IIRC) you can saturate for $20ish a month.

Otherwise, I'm sure some organization will be happy to provide some bandwidth in exchange for a shoutout.


> they use "straight" quotes instead of “curly” quote

why care?


I don't mean this ironically, I really don't understand why would anyone care about such things?


So, not about curly vs straight quotes, but about whether to use en-dash, em-dash, minus, or hyphens on Wikipedia. For example, what mark do you put in "Mexican-American War"?

https://en.wikipedia.org/wiki/Talk:Mexican%E2%80%93American_...

Here's a lengthy village pump discussion, with no outcome: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(policy...

Look at the word counts for some of the discussion: https://en.wikipedia.org/wiki/Special:Search?search=dash&pre...

Some of these discussions end up on Arbcom which is pretty severe: https://en.wikipedia.org/w/index.php?title=Wikipedia:Arbitra...

It seems a lot of people do care, passionately, about the difference between "" and “”.


Language and typography are art. Proper typography is more aesthetically pleasing.


It's great that these are in a consistent formatting style. When trying to extract some contents programmatically from some Gutenberg texts, I kept running into different formatting styles. That combined with being able to check out the entire repository makes it much simpler to do data processing on the works.

And, of course, fixing more errors is of course a noble goal. Are these corrections going to make it upstream to Gutenberg?


> Are these corrections going to make it upstream to Gutenberg?

It's an issue worth raising to the team. In the spirit of GPL, I think reporting any instances of clear typos in the source text upstream would be a good idea.

The problem is that so much work is done to the text as part of StandardEbook production that we can't exactly just submit a single patch or diff. It would be difficult to identify the textual corrections from stylistic changes in an automatic way, unless we were to enforce typo corrections to occur in a single commit. We're currently encouraged to use an [Editorial] tag when modernizing spelling such as "any one" -> "anyone", so perhaps we should see about a [Transcription Error] tag for obvious typos.

The upshot is that all of the books' sources are hosted on GH. So an interested party could, in theory, review the commit history and pull out what look to be typo corrections. See, for example:

https://github.com/standardebooks/emile-gaboriau_the-lerouge...

Failing that, the contributor could simply manually keep track of any typos they fix and report them to GP.


"Modernizing" is a very questionable thing to do IMO.

Fixing typos is fine I guess, but books are the result of an era and grammar or writing style is an inherent part of a book that should not be altered.


We’re pretty light touch. For an example, the biggest change a typical novel gets is amending “to-day” to “today”.


That's true, it changes the book materially, I think.


On the other side, modernising the texts might not be what gutenberg wants, if they want to keep original errors from the books? Haven't looked it up but I have been in other book project where it was more important to keep everything as it was printed.


> Standard Ebooks puts significant work into designing, formatting, marking up, and hosting our ebooks. While some think we could, or even should, release our work with some kind of copyright notice, instead Standard Ebooks dedicates the entirety of each of our ebook files, including markup, cover art, and everything in between, to the public domain.

https://standardebooks.org/about/


Public Domain projects that subtly argue for an extensive view of copyright in ambiguous (in the best case) situations make me somewhat suspicious.

Editing is a lot work, and their efforts are appreciated. But most of that involves the application of existing rules (curly quotes etc.) and therefore doesn't meet the creativity standard of copyright.

I guess the texts are out there anyway, and it doesn't make much of a difference. But I'm reminded of the art world, where many a painting is long out of copyright, yet only the Museums have access and the insist setting up a lightbox and taking a photo is a creative endeavour worthy of protection from the prying eyes of the non-paying public.


I was curious about the copyright situation of photos of old paintings that you referenced in your last paragraph. I found this article that explains their status: https://www.huffpost.com/entry/museum-paintings-copyright_b_.... The bottom line:

> [T]hose who are hesitant to use photos of works that are in the public domain […] should know that under the law if the image is “slavish,” a mere reproduction, a plain unadorned exact image, they can use it and do not have to pay anyone a licensing fee.


Not all the sources come from Gutenberg for a start. But for those that do I usually keep a running list of proofing corrections as I go and submit them back. Gutenberg are pretty responsive to any changed submitted if you supply a link to source scans as well.


A similar project for books in french (french books and books translated to french) : https://www.ebooksgratuits.com/ebooks.php


Thanks! I've been looking for a website with free French ebooks for quite some time...


pretty good resource, glad to see fellow french learners (or francophones?) sharing it


A book that might be of specific interest for HN’s audience is the recent production for Standard Ebooks I did of Charles Babbage’s autobiography:

https://standardebooks.org/ebooks/charles-babbage/passages-f...

It covers a plethora of subjects, but devotes a few chapters to the Difference Engine and the political difficulties in getting it funded. Bonus MathML (rendered to PNGs in most readers but real for the Kobo), and all diagrams support both normal and white-on-black dark mode if you’re got your reader set up like that.



Dang, could you lift vvillyd's rate limit as new user in case this still exists?


We removed it a little while ago, in fact. Thanks for the heads up! (Emailing hn@ycombinator.com is quicker and more reliable for next time.)


Thanks.


Is there any way to download them all? Ebooks are very small so it shouldn't be a problem to make an archive, or at least have a way to use curl/wget to download them all from a directory.


There’s something called opds. Sort of like a file that defines a library and its contents.

https://standardebooks.org/opds/

I’ve never really been able to get it to work with calibre, pretty much the standard go to piece of software for all things ebooks.

I would be interested in hearing about any tips for getting an opds working.


I ended up using this to download all the azw3 files for my kindle, it's probably not the best you could do, so feel free to use it as reference for yourself if you might do something similar.

  curl -s https://standardebooks.org/opds/all | grep -oE "/(.*).azw3" | sed -e"s/^/https:\/\/standardebooks.org/" | xargs -n 1 curl -O
P.S. Calibre seems to only work with them and the kindle when you send over USB (I keep my paperwhite in airplane mode, signed out, ads disabled)[0]

[0]https://news.ycombinator.com/item?id=20596300


For iOS there’s something called KyBook which reads that opds link and shows a list of books with cover art, book name, author and tags pertaining to the books subject matter. It does not allow one to search the opds library in anyway. I was able to download a book using the app, and from within the app move the book to iCloud. From within the Files app I found the KyBook folder, selected the book and using share opened it with iBooks and it is now in my iBooks collection. I found the app here https://www.maketecheasier.com/best-ebook-reader-ios/


It's always great to see public domain books being made available, and standardebooks is certainly worth a visit. However, while I read quite a lot and in particular the sort of books that are available there, I mostly give the site a miss purely because of its design. I don't want to seem too sarcastic, but having huge images as a listing for books is odd when most of the users can read quite well. I'm probably a bit sensitive about this, since our local library does the same thing - it almost looks like there is something about uncompromisingly textual information that provokes a reaction from web designers.


I just want to say that I appreciate all the effort.

I've finally gotten around to reading many of the classics using your ePub files.


Not only is this an excellent project, I think this is also an incredible collection of books you've chosen to feature. I also really appreciate the art choices that have been made for the covers.

I have many of these in epub from Gutenberg, but plan to replace them with your versions when I have some time.

I know others have already asked about bulk download--have you considered offering a torrent of the full library or possibly one for each file format?


This is fantastic, and immediately I want to try and contribute engineering time to it. I've tried reading Gutenberg ebooks before and gave up because of how inconsistent and unreadable they could be.

Is there a wishlist of tools/software out there that someone could contribute to?


I must admire beautiful cover art selected for ebooks.


Thanks, it’s honestly the hardest part of the production process.


Are there any plans to support languages beyond English?


Doesn't seem likely: https://standardebooks.org/contribute/accepted-ebooks

> Types of eBooks we don't accept

> ...

> * Non-English-language books. Translations to English are, of course, OK.

I don't see any rationale for it on the mailing list, only a message from Alex Cabal two years ago stating "not at the moment": https://groups.google.com/d/msg/standardebooks/JdVpCm3ckGg/i...

Alex explicitly does not want the "Standard Ebooks" name or mailing list used to coordinate similar projects elsewhere (including other primarily English-language nations), due to copyright issues: https://groups.google.com/d/msg/standardebooks/qRDTb-hHMxk/z...

Nor is Alex aware of any other similar projects: https://groups.google.com/d/msg/standardebooks/ikg07cqkABY/Q...


It looks like they’ve decided not to publish any non-English books [1]. It’s a pity – I much prefer reading books in their original language if I’m able to understand it, and I was even considering contributing some German books to their collection. Maybe it would complicate the publishing process a bit though since different languages have different practices for things like punctuation.

[1]: https://standardebooks.org/contribute/accepted-ebooks


> I much prefer reading books in their original language if I’m able to understand it

Absolutely. To the extent that I have trouble focusing on texts I know to be translations, unless there are inescapably good reasons for them to be, i.e. they come from a language I have no chance of understanding.

Currently battling to resurrect my highschool German. Getting there, but cursing myself for not starting out with something a wee bit more accessible than Thomas Mann...


The tools are available. Maybe someone would be willing to make a similar project for other languages.


The only one I know of is http://projectoadamastor.org in Portuguese. I can't tell whether it uses the same tooling, but the project generally predates Standard Ebooks.


Guess "printing by demand" will be a good source for funding such projects.


I'm really curious why a book (https://standardebooks.org/ebooks/charles-w-chesnutt/the-con...) originally published in the US would need to include this:

"This ebook is only thought to be free of copyright restrictions in the United States. It may still be under copyright in other countries. If you’re not located in the United States, you must check your local laws to verify that the contents of this ebook are free of copyright restrictions in the country you’re located in before downloading or using this ebook."

Can anyone speak to this?


A few countries had copyright terms longer than the US. For example Mexico is life of author + 100 years while US is life + 70 years.


Most of the work is life of the author + 70 years, but the US is anything published in 1924 or later (with a few exceptions, for example if copyright wasn’t renewed in the 60s).



Just because it was first offered for sale in the US doesn't mean the owner cannot apply for copyright in other countries.


I wonder if the azw3 files work with the Kindle's latest display engine? Specifically, can ragged-right be turned on? Are hyphenation hints embedded?


I don’t have a Kindle so haven’t tested, but I believe we build the AZW3 files from the epub2 ones, which have hyphenation baked in using the Python hyphenation library.


For the Amazon-compatible "azw3" files that I'm seeing, I'm curious why the book cover thumbnail images are a separate download from the ebook file itself?

Unless I'm missing a trick, it seems like you have to use Calibre (or some other application) to re-build the "azw3" file with the cover thumbnail properly embedded. Why not just ship the ebook files like that to begin with?


Kobo is far easier just download the epub on to your mobile and use the import button from the app and it will automatically get downloaded


According to the webpage

>>>Thanks to a long-standing bug in the Kindle software, side-loaded ebooks don’t display cover images automatically. "


You can also transform the epub to a format compatible with Kindle and save a few clicks. The epub contains the cover.


I applaud this idea & hope it goes well. I'm always a little disappointed when I download a book from Gutenberg and the formatting makes it virtually unreadable.

I'd also like to mention feedbooks which have a very nicely set of curated ebooks: http://m.feedbooks.com/publicdomain

(I have no affiliation with them)


I got really excited, but then I noticed that there was no PDF option. I hate epubs with a passion, because all epub programs suck.


Have you tried FBReader (https://fbreader.org/). The Android version has so far met my needs, and the Linux version works well (in my opinion).


For reference, this is what a document viewer should look like, in my opinion: https://i.imgur.com/kkhk2dX.png [#]

Notice the gray margin outside the sheet, and the padding inside the sheet. The first one gives you a general frame of view, and also you don't want the document to use the whole screen width when using a 16:9 monitor or similar screen.

The padding is necessary because you don't want characters too close to a margin, else they look like they're escaping the sheet.

I haven't been able to replicate this setup with Calibre, FBReader or any other epub reader.

Fonts are another issue. Default fonts always suck. FBReader uses Dejavu Serif, that, in my opinion looks just bad. I changed it to Bitstream Charter, which looks decent, but then the line justification looked wrong, I changed that, and then paragraph margin looked wrong. There's a million little things that look horrible by default and you have to spend an hour per book setting up your reader so that it looks right.

I've tried generating PDFs with Calibre, the result: giant ugly fonts, zero sheet padding, nonsensical spacing, etc.

At some point you just give up and avoid epubs like the plague.

[#] The book is called Crypto 101, by lvh.


What about any of the ways to convert epub to pdf yourself? Are they no good?

If anything, I think an HTML version that you can view right on the website would be the best format addition. It's always interesting to me when a website doesn't offer a browser-native document format as an option to view text.


I've tried generating PDFs with Calibre, the result: giant ugly fonts, zero sheet padding, nonsensical spacing, etc.

I could spend the time trying to make them look right, but I don't want to spend my time that way. I prefer learning things that will give me more satisfaction per second spent.


Submit a patch. It's easy to moan about a missing feature, but isn't terribly productive.


I use Freda in Windows and Android, and Foliate in Linux.

I find any of them far superior to PDF.


I convert epubs to mobi and read on my Kindle. PDFs are horrific on a Kindle.



This is a great idea.

Another ebook non-profit I'd like to see is one that shepherds books through the copyright maze. No doubt there are scads of books in the public domain that no one has proven are actually there. Perhaps this exists already but it strikes me as a good separate, and highly targeted, kind of effort.


BTW does Kindle let you load your own DRM-free ebook files instead of buying books on Amazon? I use a PocketBook (pocketbook-int.com) which emulates a mass storage device and lets me read everything. I once considered buying a Kindle but heard it won't let me load bare files this way. Is this true?


You can sideload the files via USB, just copy the files over to its internal storage and it will appear on the home screen.

You can also associate a Kindle with an email address, and email files to that address. The file appears as part of your cloud collection.

For sideloading, the books need to be azw3 or mobi. For emailing the book needs to be mobi. In either case epub is not accepted.


Calibre is a great way to manage syncronisation and reformatting to Kindle-supported file types, it even allows you to select your Kindle model to make sure it looks well:

https://calibre-ebook.com/

https://www.howtogeek.com/73979/how-to-organize-your-ebook-c...


I have had sideloaded files mysteriously disappear from a Paperwhite 3.


My Kindle Paperwhite will read PDF, there are some limits and Mobi displays better but it possible to read a PDF on a Kindle


Very informative. Thanks.


I've forgotten to mention my PocketBook also has a microSD card slot. It came with just 1 GB of internal memory, ⅓ of which is occupied with the OS (Linux) so I've just bought an additional 16 GB microSD card to extend it recently. Now I just plug the card into my laptop SD slot (using a microSD-SD converter that came with the card) and don't even need to use a cable (which never worked well, it often happened that the device would charge but won't establish a connection).


Absolutely. I've been using Kindles for a decade and have never bought any ebooks from Amazon.

The only functionality issue I run in to is that as far as I can tell, Amazon has a feature where if you buy the book from them then they keep your progress synced between your kindle and their phone apps. Can't use that with books from other sources.


I generally upload them to a file-hosting website--0x0.st works well--and download them to the kindle from there. Use calibre to convert them to mobi format if they're pdfs.


Calibre will convert your books for the Kindle, so yes.

I have a Kobo that lets you read even more formats than Kindle, like .cbz for comics.


I'm a fan of Standard Ebooks for my Kobo Aura One reader. Their EPUB versions look great.


Great project, following your progress and will be getting books from here since the formatting is so much nicer than some other places.

Question: how is Philip K Dick in public domain already?


Several of his works were published before 1964 and the copyright was not renewed.

https://en.wikisource.org/wiki/Author:Philip_Kindred_Dick


Is there any way to browse by genre and year published?


You can search by genre, but the only way I noticed to do it is to click on a book with the genre you want, then there'll be a link to browse other books with the same genre tags.


Are there any good, open source e-readers out there?


There's [Foliate](https://johnfactotum.github.io/foliate/) for reading epub files on Linux.


I have good experience with FBReader for Android (from F-Droid). For some reason FBReader for Linux (from Debian) seems more problematic.


Any chance to provide mobi format? The azw3 files do not work on my Kindle Paperwhite.


How are you sending the file to your Kindle? AZW3 should work if you're connected to your computer via USB and dragging over manually or sending over via Calibre, but I've seen it not work if you're trying to send it to the email addy associated with the kindle. For that, you are correct, MOBI is usually the preferred option.


Yup, I am trying to email it.


Won't work, I'm afraid! And the project leader has already shot down that request in the past. See here:

https://groups.google.com/forum/#!searchin/standardebooks/mo...

It should still work if you transfer it over USB, but if you're trying to do it all wirelessly, simply download the EPUB and convert it to MOBI, and you should be good to go.


This is absolutely amazing!


How many books have been produced? What is the goal/hope?


This made me smile: (from the style guide)

> Do convert from logical punctuation to American punctuation where possible.


[flagged]


Without commenting on the contents, it wasn’t fully published in English until 1939, which means it doesn’t arrive into the US public domain for another 15 years. It’s also down to personal preference what people work on.


Apparently someone interested to creat a proper ebook within the project's production standards volunteered to create one.


Mein Kampf is pretty long and, frankly, boring (unlike The Communist Manifesto that reads like a poem).


Great idea.

Now consider something similar for Audiobooks.


Are you familiar with LibriVox? https://librivox.org/


Yes but the audiobook, voice, and reader quality has been very hit and miss.


Loyal Books has a lot of public domain books as audio books. In my experience the quality varies widely but there is plenty of good content.

http://www.loyalbooks.com/


Unfortunately none of the formats given work with Amazon's "email to kindle" system, which is the most convenient way to load books-- it allows you to download a PDF on your phone and send it to a special email address associated with your Kindle device. Considering all the work this site has already done preparing the book files, it seems like they might as well ought to generate PDF files using a page size roughly equal to that of the most common Kindle readers.


Amazon's "email to kindle" system also accepts books in the mobi format which would be a preferable to mapping pdf page sizes to Kindle reader screen sizes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: