Project Gitenberg | Hacker News

transfire on Aug 23, 2014 | [–]

The idea has a lot of merit. So for that two thumbs up. But I would much rather see a separate website for it. Using Github feels very strained. Perhaps Github would be willing to help set you up with your own instance of their platform which you could modify to better suit the purpose. Maybe even Project Gutenberg would be interested in participating in that.

BTW, I recently learned the Gutenberg was not his name and is really a significant historical inaccuracy. His name was Hannes Gensfleisch. "Gutenberg" was just one of the places his family resided.

a3_nm on Aug 23, 2014 | | [–]

> Perhaps Github would be willing to help set you up with your own instance of their platform which you could modify to better suit the purpose.

Alternatively, you could use, and customize, an open-source self-hostable alternative, such as gitorious https://gitorious.org/gitorious

> BTW, I recently learned the Gutenberg was not his name and is really a significant historical inaccuracy.

Do you have any source for this fact that Gutenberg wasn't using "Gutenberg" as his last name?

csandreasen on Aug 23, 2014 | | | [–]

I don't think the 'Gutenberg' name could be described as a historical inaccuracy, but there is some truth to it. From Wikipedia:

Wallau adds, "His surname was derived from the house inhabited by his father and his paternal ancestors 'zu Laden, zu Gutenberg'. The house of Gänsfleisch was one of the patrician families of the town, tracing its lineage back to the thirteenth century." Patricians (aristocrats) in Mainz were often named after houses they owned. Around 1427, the name zu Gutenberg, after the family house in Mainz, is documented to have been used for the first time.

sethish on Aug 23, 2014 | | | | [–]

It's all checked into git now, and I have an API to fetch the repo names via github's api. Migrating from github to self-hosted would now be easier than doing it from scratch.

Github does have issues with Unicode repo names, so it may be worth moving elsewhere for that reason.

But in the short term, I like Github because they have the most to gain by making git easy to use. If I can get away with people editing books in their browsers on github then we have a editing/rendering toolchain right out of the box!

walterbell on Aug 23, 2014 | | | [–]

Anyone creating an OSS collaborative authoring platform could benefit from lessons learned in commercial authoring platforms:

http://blog.inkling.com/2014/06/problem-of-structured-author...

http://alistapart.com/column/wysiwtf

We need momentum around one integrated OSS toolchain, including illustration (Inkscape) and CMYK color for printing (Scribus). This would help editors and publishers who want to move away from rental pricing for authoring software.

As anyone who has tried to find good books in a sea of free books knows, there isn't a standardized way of collaboratively improving book Metadata. This needs to include e-production history, translators, print publication history and content objects _within_ books, viz. Doug Engelbart's purple numbers, http://www.dougengelbart.org/about/ohs.html

sethish on Aug 23, 2014 | | | [–]

I would love to help standardize this workflow, but it has to be a community process and discussion. I'm meeting with the inkling folks sometime next week to collect more information.

re: Metadata. I'm slowly working on that problem. I'm a HUGE fan of purple numbers.

walterbell on Aug 23, 2014 | | | [–]

> it has to be a community process and discussion

Would you recommend any existing orgs, mailing lists or forums where there's already an active community for discussion?

sethish on Aug 23, 2014 | | | [–]

GITenberg has a mailing list: https://groups.google.com/forum/#!forum/gitenberg-project

And there has been some conversation about the project on the OKFN-humanities list: https://lists.okfn.org/mailman/listinfo/open-humanities

OKFN is organizing a skype call about the project on the 1st of September 6pm UK time that will be announced on the humanities list in the next few days.

sethish on Aug 23, 2014 | | | [–]

GITenberg is more of a reference to Project Gutenberg at this point than Johannes. But no, I didn't know that at the time :-(

DavidAdams on Aug 23, 2014 | | [–]

My biggest question is this: did the idea for this originate with the pun, or did they think up the great pun afterward?

sethish on Aug 23, 2014 | | [–]

Idea first, pun after. But thanks!

gluejar on Aug 23, 2014 | | | [–]

Funny thing was, about a year behind Seth, I thought up "Project Gitenhub" made a few repos, found GITenberg, and decided to direct my meager efforts in that direction. For sure the pun was a huge motivator!

prosody on Aug 23, 2014 | | [–]

What advantage does this offer over Project Gutenberg's own Distributed Proofreaders[1]?

[1] http://pgdp.net

sethish on Aug 23, 2014 | | [–]

PGDP does a series of passes on a book. They don't continually update it. If a book makes it through their process with a typo, or more common, made it through the transcription process 20 years ago, there aren't good systems in place to fix those issues.

fernly on Aug 24, 2014 | | | [–]

As a long time DP contributor, I'd say the models could not be more different. DP is elaborately and deeply organized around the model of a great many people doing a great many small units of work (stop by once a day and proofread just one page for very specific things). This minimizes the responsibility any one person needs to feel for a book. And like the citizen science projects at zooniverse, individual mistakes are corrected in a multi-pass process where multiple people (5 or 6) see every page.

The new project follows the Github programming model, which means if a person wants to contribute to a book she has to clone the book, make local changes, push, issue a pull request. This is far, far more complex than stopping by pgdp for a ten-minute proofing session. Very few of the drive-by proofers at DP could manage that technically, or would want to be that involved.

Most important it lacks the QC inherent in the DP model of having multiple later reviewers catching earlier errors. Who will do line-by-line vetting of the accuracy of pull requests? Besides the inevitable detail mistakes, there are potential problems similar to those faced by Wikipedia: who will even notice if some local zealot decides to insert editorial comments in, or to bowdlerize the language of, some classic?

It's true that DP's work ends when the book is posted to PG, and that PG has only a feeble update method (email to their errata contact). I could certainly see something like this project as an adjunct to PG, dedicated to continually refreshing the library, but with editorial control over which books go into it.

sethish on Aug 24, 2014 | | | [–]

Oh yes. GITenberg is not a replacement for DP in the slightest. New books aren't likely to come out of GITenberg as it currently exists. That is what distributed proofreaders is for.

At some point, I would like to investigate DP tools and see if there is something I could contribute.

gluejar on Aug 23, 2014 | | | [–]

Gitenberg and DP are quite different in their objectives. DP was created primarily to clean up OCR- text only. Modern ebooks have a whole 'nother dimension to presentation. fonts, graphics, reflowable layout, footnotes, links, indices, etc.

For example, I contributed enhancements to "Alice in Wonderland" https://github.com/GITenberg/Tenniel-Illustrations-for-Alice... (from a version in mobileread). DP produced one edition; There are in fact many public domain versions of Alice. How to keep them all straight? PG has no answer, but version control systems allow us to use fork and merge processes to start to deal with the way the real world works.

kbar13 on Aug 23, 2014 | | [–]

One thing I would like to see out of this project is a better version control system for prose. Git is great for code, but it's not at all any good for editing text.

spain on Aug 23, 2014 | | [–]

+1, in code the basic unit is usually a line, but in prose it is the sentence. I've tried using git with LaTeX and it always ended up with weird situations where you had to put each sentence on a different line to make it work effectively.

azernik on Aug 23, 2014 | | | [–]

I think this could be made easier by using the git database as a substrate - there's nothing in that that's tied to line-oriented files (maybe the compression algorithms make some assumptions?).

It's the diff-viewing infrastructure that needs to be completely replaced for prose.

davvid on Aug 23, 2014 | | | [–]

Indeed. git diff --color-words will often do the trick for prose, but the real solution is plugging a sentence-oriented diff viewer into git difftool. git difftool is extendable by setting git config variables, so it's very much doable.

zwp on Aug 23, 2014 | | | | [–]

Perhaps "git diff --word-diff" helps? I don't see a similar option for "git show" though and that seems important :(

I think you're right: if the diff-viewing infrastructure was abstracted (plugin?) it could do this and more (eg word-diff-by-paragraph, image-diff).

guynamedloren on Aug 23, 2014 | | | [–]

Git is actually great for text (code is just text after all), it's just a matter of a better display for the diffs. Might want to have a look at a project I'm working on, Penflip [1], which does exactly this. It uses Git to track changes in markdown files, and has a web interface like GitHub but geared towards writers instead of developers. The in-browser editing interface is a fork of http://prose.io/, which allows for git commits (and other actions) from the browser.

[1] https://www.penflip.com/

kyllo on Aug 23, 2014 | | | [–]

That exists, and it's made by a YC alum, Nate Kontny. It's called Draft. https://draftin.com

cpach on Aug 23, 2014 | | | [–]

I’ve heard good things about Draft. But since it’s proprietary it’s not a good fit for people who want to develop and modify the tools they use.

gluejar on Aug 23, 2014 | | | [–]

I discussed this with one of the Git maintainers; it seems that the main work needed is a modification of the way text is chunked (into lines), and this chunking is already suitable factored.

chippy on Aug 23, 2014 | | | [–]

Check out http://prose.io/ it might be suitable - its aimed at the CMS end of things though

johnchristopher on Aug 23, 2014 | | | [–]

They really should rewrite their "easy" guide to set up the starter project for prose, it's not clear at all what should be done to get it running.

Harpjs is two steps ahead on this one.

ldng on Aug 23, 2014 | | [–]

That's a great step. I was toying with a related idea last week actually. To me the next great step would be to great around that a framework/tools to help translation of of those ebooks.

What often happens is that editors have one translation of a book, say Les Misérable, and keep reprinting the same translation independently of the quality. So I was thinking that a github like platform to foster translation would be a great idea. Looks like gitenberg might by the project just for that.

But maybe it should pick a clone (gitlab ?), self host and fork/extend that tool to ease the use so that non-developer could use the site without git knowledge. Then again, tailorisation for translation might not be needed.

guynamedloren on Aug 23, 2014 | | [–]

> But maybe it should pick a clone (gitlab ?), self host and fork/extend that tool to ease the use so that non-developer could use the site without git knowledge

We're on the same wavelength here. I forked GitLab to make a 'github for writers'. Still backed by git repos, but a simplified web UI for less technical users. If you're interested in working together, lets chat.

ldng on Aug 23, 2014 | | | [–]

Interesting ! We definitely should talk. And with the gitenberg guys too. I could see a nice synergy here. I have to go now but I'll pm you later this weekend.

kaoD on Aug 23, 2014 | | | | [–]

Hey, I was writing a very long comment about GitBook, Penflip, Leanpub, Softcover... then found your profile and realized you're Penflip's owner.

I'll still reproduce the comment here in case you can get some insights or anyone else is interested.

---

I've been researching on Markdown+Git book publishing (inspired by 'Markdown to Ebook'[0]) and found that there are already three 'GitHubs for writers': GitBook[1], Penflip[2] and Arturo.io[3]. Each has its own strengths and weaknesses:

# GitBook

Just a publishing platform backed by Git.

## Like:

- Standalone app.

- It is a bookstore.

- Publishes to major stores.

## Dislike:

- Ugly MSWord-like typesetting.

- There's no "social collaboration" at all, seems like it's just backed by Git. Not sure if small-scale collaboration (couple authors) is supported in the app or you have to deal with Git complexities yourself.

- Seems technical-oriented. No fiction categories.

# Penflip

Penflip seems to fit your idea more (note: now I know why it fits your idea :P), being collaborative like GitHub.

## Like:

- Collaborative.

## Dislike:

- It has no integration with bookstores.

- It's not a bookstore and you can't discover books easily.

- Looks more like a "free books" platform using 'free' in the FSF sense.

- It's hard to find a complete book to peek into, but the output seems to be just as ugly as GitBook's. AFAIK they let you customize output, but seems typesetting is not LaTeX-like and suspect won't be up to the job. Defaults are very important, it should be beautiful right out of the box.

# Arturo.io

Arturo.io's page is currently down (some cert error). It looked just like a bunch of webhooks for GitHub. Seemed immature and not for less technical users (still requires Git/Hub knowledge).

---

As non-Git alternatives I found Leanpub[4] and Softcover[5].

# Leanpub

Social bookstore and publishing platform.

## Like:

- Lean Publishing.

- The author-reader interaction is awesome.

- Their PDF output is beautiful.

- It's a bookstore (90% royalties!) with social-network aspects.

- Makes it really easy to create bundles, sharing royalties with other authors, etc. Really awesome feature.

- Tools for marketing are awesome. Integrated with Google Analytics.

## Dislike:

- I can't download their toolchain (but a local workflow is somewhat reproducible following 'Markdown to Ebook'[0])

- Does not rely on Git, but in Dropbox. No proper version control.

- No collaboration.

- Does not publish to major bookstores (but allows you to do so).

# Softcover

Major Leanpub competitor. In the publishing aspect seems to be pretty much the same, but their store philosophy is different. Their aim is not to be a bookstore but just a payment processor. You deal with your own marketing, set up your own domain.

## Like:

- I can download their toolchain. As far as I can tell, I could self-publish not using their platform without hassle. Not being tied to a provider is a HUGE selling point for me.

- AFAICT still supports Lean Publishing with their generated landing pages.

- Their PDF output is beautiful.

- It's a book payments processor. 90% royalties!

- Lets you control your own marketing, domain, etc. (has downsides)

## Dislike:

- Since you control your own marketing there are no social network aspects. Each book is supposed to be its own page. No bookstore. No way to explore and discover other books.

- A bit too technical.

- DIY version control, no integrated collaboration.

- Does not publish to major bookstores (but allows you to do so AFAICT).

Even though I'd love a Git-backed workflow I'll stick with one of Leanpub or Softcover because of how beautiful they look. I can still Git it myself. Major selling point for me and the non-techie friends I've been talking to.

The bookstore integration in both is a big selling point too!

I still consider Leanpub since I can replicate their toolchain and seems so easy and powerful for my non-tech friends. Letting users discover your book in the bookstore is really useful.

--- EDIT:

Now that I know you're from Penflip I will summarize:

I can see you're not a competitor with publishing platforms. As far as I can tell, you're more like GitHub, in the private-repo business instead of taking a cut from sales. Penflip seems great for social collaborative stuff, but I wouldn't choose it if I planned on selling my book.

As I said typesetting is very important. Your platform is awesome, but rendering really put off my friends. Penflip books like like HTML rendered to a PDF (which I guess they actually are). Did you consider moving to LaTeX-based rendering for PDFs? Markdown -> LaTeX -> PDF is the way to go.

Git is a great selling point, but secondary. Book authors just don't know it yet, even though it's one of those features that you just love when you try.

---

[0] https://leanpub.com/markdown-to-ebook

[1] https://www.gitbook.io/

[2] https://www.penflip.com/

[3] https://arturo.io/

[4] https://leanpub.com/

[5] https://www.softcover.io/

sethish on Aug 23, 2014 | | | [–]

This is a fantastic overview of this part of the publishing space. GITenberg has a mailing list, and needs to start collecting breakdowns like this. I would love it if you would join us: https://groups.google.com/forum/#!forum/gitenberg-project

kaoD on Aug 23, 2014 | | | [–]

Thanks for your kind words, I recently researched the topic and thought someone might benefit from it.

I fail to realize how this could be useful for GITenberg though. Do you intend to publish the books or automate the publishing perhaps? If so, as far as I can tell GITenberg files are not structured, and won't lend themselves easily to automated publishing.

I guess the great thing about GITenberg is anyone could do their own structured .md version and request a pull. Would be cool with some automation to generate and release cool PDFs if .md file is available.

sytse on Aug 23, 2014 | | | [–]

GitLab B.V. CEO here, ask indicated by the author of Penflip in the other comment, GitLab is indeed a good start to build something like this and we welcome initiatives likes this.

lazerwalker on Aug 23, 2014 | | | [–]

Crowd-sourcing translations for video games is a fairly common practice these days. I haven't used any of them myself, but people have developed specialized tools to make it easy for community members to contribute translations. Perhaps it's worth looking into what they're using?

justincormack on Aug 23, 2014 | | | [–]

text editing on github is pretty easy now for non git users - it does all the branching and pull reqs for you.

chrisballinger on Aug 23, 2014 | | [–]

Congrats Seth! I had to unfollow you while you were making all those repos because it clogged my feed.

GitHub should really put some work into improving their feed algorithm so one project can't just clog it all.

sethish on Aug 23, 2014 | | [–]

In retrospect, I didn't need to make 80k+ commits with my own account.

FesterCluck on Aug 23, 2014 | | [–]

Has any consideration been given to works which may start in this platform? My wife is an aspiring author, and we'd like more information. I'm sure there are many topics to cover, and we're interested in hearing all of them. However, I specifically wonder about the adoption of open source licensing to such works.

Thanks.

chippy on Aug 23, 2014 | | [–]

I imagine it working similar to the original Project Gutenberg site: http://www.gutenberg.org/

For example, known books by established publishers, but with a self-publishing arm http://self.gutenberg.org/

sethish on Aug 23, 2014 | | | [–]

The main thing that GITenberg could provide authors is a toolset and workflow for using git to store books, and some kind of toolchain to turn them into epub or print-ready pdf.

That toolchain is something I effectively have to build for GITenberg anyway.

lucb1e on Aug 23, 2014 | | [–]

For anyone else who finds the font too thin and light to comfortably read, this helps: https://readability.com/bookmarklets

dredmorbius on Aug 23, 2014 | | [–]

Nice, but NB that page is REALLY hard to read.

    body {
        color: black;
        font-weight: normal;
        font-family: verdana;
    }

Helps a lot from my experience.

sethish on Aug 23, 2014 | | [–]

Gah! This website was thrown up quickly. I wasn't intending to post to HN until after I had fixed up the website. Someone beat me too it :-S

Pull requests welcome: https://github.com/GITenberg/gitenberg.github.com

mdturnerphys on Aug 23, 2014 | | | [–]

Sorry about that :-/ A librarian friend posted it yesterday and I thought it worth sharing here. I would have held off if I'd known the creator was on HN. I do feel guilty about racking up all this karma.

notduncansmith on Aug 24, 2014 | | | [–]

Please please please don't use "black". Pure black looks awful compared to a dark grey: #2e2e2e is my preference, but #222 is close approximation of black that looks very nice as well.

alessiosantocs on Aug 23, 2014 | | [–]

I really love the idea behind this! I think it's a way to disrupt the books industry with all those editor firms. What's powerful about this is that every person could be listened and her book could easily spread around the globe.

I found https://www.penflip.com/ a few months ago... It isn't focused on building a digital library yet but what I like of this project is the good execution. It would be nice to merge them together!

ryanackley on Aug 23, 2014 | | [–]

I like the idea of git for ebooks. That being said, a lot of the free books available from project gutenberg have been around for quite some time.

Besides translations, what can people besides the author contribute? Doesn't it, on some level, ruin the character of these books? If you look at a non-fiction book from 80 years ago, is it worth bothering to correct the information when you can probably find it at your fingertips on wikipedia?

gavinpc on Aug 23, 2014 | | [–]

Like others, I am doubtful that this is the best way to go about it.

But to answer your question, the main area where I've found Project Gutenberg's epubs could be improved is in their navigation outline (the toc.ncx file). For example, they often use top-level headings for each line from the title page, then put the entire book under the last line. Whereas other books are closer to what you'd expect, albeit at inconsistent levels of detail. For my project, I abandoned their TOC's altogether and created a simpler format.

The images are also at a bare-minimum of resolution. In some cases, higher-quality versions are available in the public domain (such as on Wikimedia Commons). Most of the books are also scanned on archive.org, and so can be referenced there in facsimile. These tend to be higher-resolution scans (although those are all monochrome that I've seen).

For corrections in the works proper, I have occasionally submitted corrections by email but never received a response.

Otherwise, they are perfect, and I thank them for their outstanding work.

EDIT: There are also rare cases (I think Seneca was the one I came across) where the id's are not unique across the book, even if they are within the HTML files. I couldn't find anywhere in the EPUB specification that would require this, yet for practical purposes I think they should be made unique across the book, since the division into HTML files is arbitrary.

Further to that, there are some PG books that have a unique (serial) ID on every paragraph. Again, this is not required, but it's extremely helpful when it's there (for anchor referencing). It would make the whole library more usable if this were applied consistently, and the serial id's are apparently mechanically applied.

taejo on Aug 24, 2014 | | | [–]

I've noticed several problems recurring in Gutenberg ebooks. Mistakes in the words per se are rare, but:

* substitutes for characters missing from ASCII (e.g. L for £, no proper dashes)

* incorrectly delimited chapter heads (e.g.:

    *CHAPTER 11: THE GREAT*
    BOONDOGGLE George Boondoggle sat on his lawn...

instead of

    *CHAPTER 11: THE GREAT BOONDOGGLE*
     George Boondoggle sat on his lawn...)

* footnotes appearing in the middle of a page (based on the pagination of the original print edition, perhaps)

* missing italics, underlines, etc.

* ASCII-fied equations and diagrams

These do not detract from PG's original goal of being an archive of plain text, and suffice to provide scholars of the 22nd century a good view of what was written in the 19th, but they do detract from the experience of somebody who just wants to read Anna Karenina for fun. (Especially if they are a typography nerd like me)

sethish on Aug 23, 2014 | | | [–]

There are a number of transcription errors in many books. There is an example PR https://github.com/GITenberg/Chess-Strategy_5614/pull/1

atheken on Aug 29, 2014 | | [–]

This is interesting, but I am not sure that I would have done it with multiple repos. Why not build a single repo with a convention for adding/updating works. As it sits right now, there are 2100+ pages of repos. It also means that in order for me to contribute to more than one of these, I'll need to pollute my own account with multiple forked repos.

atheken on Aug 29, 2014 | | [–]

From another perspective, one repo should allow you to gain more traction as all stars/forks/pull requests/commits will be aggregated on it, and thus produce higher visibility on GitHub (and probably anything that scans github stats).

Additionally, using a single repo would allow me to fork and specify my own styles that I want applied to any work I "compile", and these might be hyper-specific.

I'm actually willing to help consolidate these repos if you're willing to go in this direction. I'd also like to hear reasoning for multiple repos if there's something I'm missing.

sethish on Aug 23, 2014 | | [–]

If folks are interested in contributing, the mailing list is here: https://groups.google.com/forum/#!forum/gitenberg-project

gluejar on Aug 23, 2014 | | [–]

One obvious need is for a build system that makes ebook files out of the git-managed source. And what should our source be, anyway?

fiatjaf on Aug 23, 2014 | | [–]

Why don't you add some kind of index/search?

sethish on Aug 23, 2014 | | [–]

Because parsing the original metadata from Project Gutenberg is time consuming to write. I wasn't going to submit it to HN until I had an index/search api, but someone beat me to it.

arafalov on Aug 23, 2014 | | | [–]

But what/where is the metadata? Is it functionally equivalent to the Gutenberg's info (e.g. in the RDF dump). Or something else?

I was looking to write an alternative search for Gutenberg, based on the RDF dump, so would be happy to collaborate/discuss ideas.

sethish on Aug 23, 2014 | | | [–]

Yep. The RDF/XML data. I have a mirror of it on github: https://github.com/sethwoodworth/PG_rdf_metadata

I would love to have a complete python parser for the metadata. I strongly recommend collaborating with the Gutenberg package posted to HN a few weeks ago (and his rdf branch): https://github.com/c-w/Gutenberg/tree/migrate-to-rdf

GITenberg has a mailing list and would love to have you!

https://groups.google.com/forum/#!forum/gitenberg-project

Taylorious on Aug 23, 2014 | [–]

I don't understand the weird obsession with Git. Its a version control system not the cure for cancer. Anytime someone shoe-horns it into a product they talk about how Git is so amazing and solves all these problems, but what they are really talking about is just a version control system, not Git specifically.

Using Git for just about anything other than what it was built for is a terrible idea. I mean the underlying system is incredibly powerful and could be useful in various projects, but the interface is horrific. I swear its like someone tried to make Git as difficult as possible to use. Programmers have a hard time understanding and using Git, non-programmers will just laugh and walk away. Every time a programmer has an issue with Git, whoever helps them has to sit down and explain the underlying system for 20 minutes and draw a bunch of sticks and bubbles. Non-programmers will never put up with this.

gavinpc on Aug 23, 2014 | | [–]

(As an aside, I sometimes feel the same way about node.js, where I've seen "node.js is awesome" listed among a project's "features." Nothing against it, I just don't get the obsession.)

I appreciate this comment with respect to Git right now. I've recently spent a lot of "hammock time" trying to come to grips with my views about this profession generally and what I believe is best going forward. One thing I feel strongly about is that while we are still maturing as a field, the pain points are unacceptable. There is still so much work to offload to the machine, requiring fundamental rethinking at many levels. So although I agree in principle with the initiative to help people "learn to code" (so that we can bring system design closer to the domain experts), I also believe that in the current state of things, it's a wasteful effort, since it requires conveyance of ideas that should be deprecated.

But even short of programming, version control alone would be useful in so many other fields. There's no reason why it shouldn't be a mainstream concept even for personal use (e.g., you're working on a thesis). Just an hour ago, during my annual flirtation with Git (I'm a Mercurial user), I wrote in my notes:

> the barrier to entry for new programmers is important. This would appear to weigh in favor of Mercurial — and yet, realistically, is a “layman,” i.e. someone who knows nothing about software development and has never used a CLI, really going to distinguish between these two systems, or will the very concepts of a VCS not prove to be the biggest hurdle?

I have used Git, and I think that for linear history the differences are not remarkable. But the attitude you refer to is crucial: do we want to hide complexity or expose it?

Incidentally, I have several Project Gutenberg epubs under version control for a personal project, and like the OP I attest that their work is first-rate. There's no comparison to any other digitizer in the public domain (that I know of).

hhsnopek on Aug 23, 2014 | | | [–]

> Every time a programmer has an issue with Git, whoever helps them has to sit down and explain the underlying system for 20 minutes and draw a bunch of sticks and bubbles.

This isn't true at all for a lot of people. I know a lot of people that just read the docs and are able to solve the issues. Others will Google the problem and find the solution on stack overflow. Everyone learns differently...

> Anytime someone shoe-horns it into a product they talk about how Git is so amazing and solves all these problems, but what they are really talking about is just a version control system, not Git specifically.

Git is amazing and does solve a lot of problems, but there are problems that aren't solved by Git. Even Linus himself says this here: (https://www.youtube.com/watch?v=4XpnKHJAok8).

Using the github API, rather than git, for creating epub books and pdfs is a great. Using git to control changes as the do is perfect as well.

> Non-programmers will never put up with this.

Ermm don't assume that everyone gives up right away. With the GUI interfaces we have today, Git is really simple once you learn it.

recursive on Aug 23, 2014 | | | [–]

> Git is really simple once you learn it.

Pretty much everything is simple once you learn it. That's what learning is. But git certainly doesn't go out of its way to make that process easy.

Dylan16807 on Aug 23, 2014 | | | [–]

>Pretty much everything is simple once you learn it.

I wouldn't say so. A lot of things are designed-by-committee implemented-by-the-lowest-bidder messes that are painful and complex even once you know how they work.

Git may have some weird design decisions but for the most part it's well-implemented and follows a simple conceptual model.

sethish on Aug 23, 2014 | | | [–]

I can't think of a VCS with a better online tool than git and github. With editing books, it is entirely possible to use only the github editor, which effectively abstracts the git command line interface.

robert_tweed on Aug 23, 2014 | | | [–]

Git, in spite of the horrifically complex interface, is in essence a really dumb version control system (I mean that in a non-insulting way). This means it's fairly neutral about what kinds of data you can throw into version control. And more importantly, it almost never complains about what you give it.

I think that's why people are now starting to think about applying version control to domains outside of code and choosing Git to do it. For example, I had an idea a few years ago to make a CMS on top of Subversion as the data store (never got around to building it though). Now there are lots of projects like that built on top of Git: CMS, Wikis, you name it. Generally anything that can work off flat files is very easily converted to use Git as a back-end, giving you advanced version control features more or less for free.

From a practical perspective, the difference is not just that Git is a trendy new silver bullet, it's the "dumbness" that makes it actually easier to do that kind of work than it would be on older version control systems like SVN. Interestingly, for the most part, most of these projects do not really benefit from the distributed nature of Git (although for things like wikis and CMSs it can offer yet another feature: content migration between instances). It's more about the ease of use for getting data into a repository and under version control without it exploding when something unexpected happens, like a file getting renamed.

You might not get the best front-end experience for actually doing stuff with that version history (as other comments have noted wrt diff tools, etc., which tend to be geared towards code rather than other types of content) but that's the fault of those tools rather than Git (which is dumb enough not to care about content types), so it's just a question of incrementally building up a better toolset for your particular content domain. That's much easier to do and more approachable than building the whole infrastructure from scratch.

As for Github, it happens to have a nice interface, toolset, documentation and mindshare. Developers are familiar and comfortable with it, so there's no need to research and learn "yet another tool". And because it's cloud based, you can get up and running very quickly without worrying about hosting, etc. That's just more icing on the cake really.

Bluestrike2 on Aug 23, 2014 | | | [–]

Writing version control software is hard. There are just so many potential use cases, not to mention the differing perspectives on how users interact with each use case as well as how they're applied to specific projects a user is working on, increased complexity is inevitable. On balance, I think Git manages to strike a good balance with most things even with its unique eccentricities.

Git's popularity isn't because it's the best tool out there for all scenarios. It's popular because it's a distributed system that helped communities grow around code managed with it while removing barriers to entry. In my opinion, that more than anything will be Git's lasting legacy.

collyw on Aug 23, 2014 | | | [–]

I have asked before, how many projects need a distributed version control system? It adds extra complexity to the concepts, and is probably rarely needed. (Distributed, not remote version control, which I can see as being useful).

justincormack on Aug 23, 2014 | | | [–]

You can't do anything that works offline without a distributed system.

You can use a distributed vcs as a centralized one if you want.

collyw on Aug 24, 2014 | | | [–]

Save locally and push when you get online again.

TeMPOraL on Aug 23, 2014 | | [–]

> Every time a programmer has an issue with Git, whoever helps them has to sit down and explain the underlying system for 20 minutes and draw a bunch of sticks and bubbles. Non-programmers will never put up with this.

But from my experience, they have to do this exactly once per (non-stupid) programmer. The moment you grok underlying structure (basically all graphs and pointers), the apparent complexity disappears and most of the things in git become obvious. I see no problems with explaining this to non-programmers as well, you just have to spend a little more time, because they probably aren't used to think in terms of graphs.