A look back: Bram Cohen vs. Linus Torvalds (2007)

solutionyogi · on Aug 1, 2014

This article brings back memories. Back in 2007, I had just watched Linus' presentation on Git at Google (http://www.youtube.com/watch?v=4XpnKHJAok8) where he called all non distributed version control systems as useless. I could not make any sense of the DVCS from his talk. I tried to play with Git and it was extremely frustrating due to the poor CLI. I thought may be Git is just a fad. But then more and more people kept talking about how awesome it is.

This was one of the article where Git finally clicked for me. The key quote:

There is no need for fancy metadata, rename tracking and so forth. The only thing you need to store is the state of the tree before and after each change. What files were renamed? Which ones were copied? Which ones were deleted? What lines were added? Which ones were removed? Which lines had changes made inside them? Which slabs of text were copied from one file to another? You shouldn't have to care about any of these questions and you certainly shouldn't have to keep special tracking data in order to help you answer them: all the changes to the tree (additions, deletes, renames, edits etc) are implicitly encoded in the delta between the two states of the tree; you just track what is the content.

It's been 7 years since I have been using Git and I can't imagine how I ever worked with version control which didn't work on the entire tree.

shubhamjain · on Aug 1, 2014

The brilliant thing about Linus that never ceases to amaze me is his level of knowledge and how he is never 'wrong'. He has always defended his decisions, maybe in arrogant tone, against countless arguments and each one stands tall.

Lately, Linus announced use Git object database format for Subsurface[1]. One of the respondents said "Why not use JSON?". Linus excellently defended by saying, putting everything in one file was not great. So, even though not being a web guy, he was still aware, why using Git object file format had more merits than any other thing.

[1]: https://plus.google.com/+LinusTorvalds/posts/X2XVf9Q7MfV

ayrx · on Aug 1, 2014

"Me _personally_, I want to have something that is very repeatable and non-clever."

This is what all software engineers should aim for.

colanderman · on Aug 1, 2014

Ironic from Torvalds, given the haphazard way some git commands interpret their arguments. Maybe he didn't write those.

atmosx · on Aug 1, 2014

> I knew Torvalds was smart, but seeing as I was never really more than an occasional Linux user I never realized just how smart;

Hm, when a guy writes his own kernel he is smart. I mean, as far as implementing goes, as smart as it gets. The amazing thing is that he was pretty young when he did it (1991-2). And then, there this[1]. When people talk about "hackers", Linus is the first person that comes to mind.

[1] http://lwn.net/2000/0824/a/esr-sharing.php3

Erwin · on Aug 1, 2014

Around 20 years ago when I was taking my computer education, low-level programming was all there was.

OS programming from absolute scratch is nothing special by the standards of what was done that day (some years later you had OS toolkits and a huge amount of tools to make that far easier, like virtualization; 20 years we could maybe beep to debug our programs). Many in the programme grew up on Commodore or Spectrum which also meant a lot of low level tricks.

So Linux 0.1 didn't really have any amazing contributions to computer science (on the contrary, you may recall the famous Tanenbaum-Torvalds thread on microkernels vs monolithic kernels). It was pragmatic and, quite quickly, useful.

I think where Linus did extremely well was a) successfully managing a huge number of contributions while being highly technically involved and b) relentlessly changing the internal design to improve it. If Linux had been a commercial product, there'd be lot of senior people greatly invested in their own designs that'd be unwilling to modify them.

For comparison, here's another famous kernel programmer who has the technical skills, but not the collaboration skills: http://www.templeos.org/

njharman · on Aug 1, 2014

> So Linux 0.1 didn't really have any amazing contributions to computer science

That's a key point. Linus is not a Computer Scientist, he is a programmer. CSs advance the theory of computation. Progs make shit we can use.

tzs · on Aug 1, 2014

> Hm, when a guy writes his own kernel he is smart. I mean, as far as implementing goes, as smart as it gets. The amazing thing is that he was pretty young when he did it (1991-2)

I'm a fairly anti-social person. I don't know many people. Yet at the time Linux came out, I personally knew a dozen people who could easily have written it when they were young.

So why didn't they?

Several of them had satisfied their urge to hack on operating systems by getting jobs hacking and porting Unix (and a couple of them "ported" Unix by essentially writing a new implementation).

The others who could have done it had no need for it. They all had easy access to Unix workstations and Unix VAXes, and were busy dealing with their urges to hack on other things like graphics or AI or networks or scientific computing.

The amazing thing about Linus is not his considerable technical ability--plenty of people have that--but rather his management ability. As I said earlier, I know at least a dozen people who could have written a kernel...but I don't think any of us could have taken it from a one man kernel to a worldwide project with hundreds of contributors.

In a hundred years, Linus Torvalds will have a footnote in technical textbooks, and a whole chapter in business textbooks.

BrandonM · on Aug 1, 2014

> [1] http://lwn.net/2000/0824/a/esr-sharing.php3

Here's the thread context, for anyone else who is curious how Linus responded: http://lkml.iu.edu//hypermail/linux/kernel/0008.2/0240.html

For the record, he didn't seem to address esr's email.

ethomson · on Aug 1, 2014

I'm surprised that the author of this post would point out a rename conflict as something that "git gets right", in part because I'm relatively certain that git-merge-recursive did not exist when this this mailing list exchange occurred (I'm actually surprised that it was the default already in 2007) and git-merge-resolve would have done something completely different, treating `greeting` as deleted in both and `saludo` as added in left. There would be no conflicts and `saludo` would merrily be created, which seems like the wrong thing.

But I'm mostly surprised because rename conflicts are this transient thing. git-merge-recursive will detect a rename conflict, but you're hosed when it comes time to resolve it, since the information that it's a rename conflict isn't captured anywhere except, briefly, in the phosphors of your CRT.

In the author's example, when you run `git status`, it will simply tell you that `saludo` was added by them. Which is exactly the behavior of the rename-deficient git-merge-resolve. The expectation in resolving this, I suppose, is that you saw the message that this was a rename/delete conflict, remembered the original filename and could somehow make a decision based on that.

This is not terrible in a rename/delete conflict, but for some other types of rename conflicts, it's much more difficult. For example, branch 'A' renames a file from 'foo' to 'bar', branch 'B' renames it from 'foo' to 'baz'. Now you have two files in your working directory and git-status can only tell you that they were each added, which is not indicative of a conflict.

This is annoying for a user on the console. This is impossible for somebody trying to build a UI to resolve a merge conflict: 'bar' was added in one of the branches... why does this conflict? Well, if it's only on one side of the merge, then it must have come from some rename conflict. But with which other file? What's the common ancestor that git-merge-recursive decided was a rename? Meh.

(Please do not mistake this rant as a suggestion that Codeville's merge is superior to Git's. I'm not suggesting that, just that git-merge-recursive has a few rough edges that could use polish.)

wincent · on Aug 1, 2014

That's exactly the point: Git didn't handle the rename conflict so well at the time of the mailing list exchange, but it did handle it better by the time the blog post was written. And it may handle it better still in the future, precisely because the repo format isn't laden with metadata[0], and the handling of edge cases like this can be improved by evolving the heuristics that Git uses to infer what happened.

It's a bet that "future self" (improving heuristics) will be more effective than "past self" (attempting to design a future-proof the repo format). It looked like the bet was paying off in 2007 when the blog post was written, and 7 years later that still seems to be the case.

[0] Metadata which would need to be carefully managed for compatibility across versions, and which would be missing any time the user forgot to explicitly record it (with a Git command) and instead made a change directly to the worktree.

ethomson · on Aug 1, 2014

Yeah, we're in agreement about that. The simplicity of the git repository is very nice. The repository format is a thing so beautiful that it makes you want to cry.

With a few horrible warts thrown in that make you actually cry.

wirrbel · on Aug 1, 2014

I think this is a good example for complexity management. Linus has a bottom-up approach to this. With a few building blocks you build up a system where you can define and work with simple algorithms that are both understandable and approachable by a single human mind.

The underlying assumption is that simple approaches can lead to an "easy" solution. To contrast this with a complex algorithm, a complex algorithm is in a lot of cases harder to implement and reason about.

I would like to object that generally, you cannot assume that simple means easy and complex means hard, there are complex systems that actually turn out to be easy to reason about and simple systems that turn out to be quite hard.

I actually would not be surprised if the next generation of VCS will feature more complexity than GIT to make working with rewritten history easier and to pave the way for certain workflows that git makes possible but not convenient. Then I hope that these approaches will be complex but easy.

PS: Subversion is for example an example for complex and hard. While the interface of subversion aims at being quite easy and usable, the implementation is very complex with a lot of corner cases, exceptions and an abundance of leaky abstractions. It is a primary example of top-down design gone wrong.

jobigoud · on Aug 1, 2014

The two approaches also exist for in-application undo/redo stacks.

You can either try to track the operation you did that will need to be undone, or you can track the state of the document prior to the change, whatever change that be. I have found the second approach to be more robust and simpler to think about.

anon4 · on Aug 1, 2014

The first approach is mostly an optimisation when you need to operate in a memory-tight environment and can't afford to keep several complete copies of past states.

xorcist · on Aug 1, 2014

Not necessarily. The point here is that you reason about the complete state, not that you store it as-is.

See for example git itself which has grown quite an efficient storage system despite the design ideas being as described in the article.

AnimalMuppet · on Aug 1, 2014

Linus didn't just wake up one day with these ideas. He'd been using source code control systems for a while on a huge project (the kernel) and had been growing dis-satisfied with what they did. He knew, by direct experience, what he wanted to be different, and why.

riffraff · on Aug 1, 2014

And yet, at the time I was hoping we'd get darcs-like cherry picking, and 7 years later the incumbent VCS still doesn't :(

sanderjd · on Aug 1, 2014

Care to elaborate on what is better about darcs' cherry picking?

defen · on Aug 1, 2014

In git, a cherry pick essentially just looks at the diff specified by that revision, then applies that diff as a new commit. The system doesn't record the context (unlike with a merge where you at least know the parents).

darcs system is way too complicated to get into in a short comment reply, but the basic idea is that if you cherry-pick a commit you get all the context along with it. That's because darcs stores a series of patches rather than a series of tree states.

One nice thing about git's way is that since it's just pulling a diff, you can cherry-pick from anything, (e.g. add a remote that's a totally separate unrelated project) as long as the diff applies cleanly.

sanderjd · on Aug 1, 2014

Thanks for the explanation - very interesting!

riffraff · on Aug 1, 2014

from 10000 feet, cherry picking in git is "get me this single commit from that branch", in darcs it's "get me this commit and every other needed to get it to apply correctly in this one".

pjungwir · on Aug 1, 2014

This makes me think it'd be handy to embed git into a desktop application and use it as the datastore. But I suppose the GPL prevents this unless the app is open source.

rakoo · on Aug 1, 2014

Embedding git the application (or even the library) itself can be difficult, as shown by github's experience [0] (I guess they know what they're talking about).

What you can do, on the other hand, is use the git format. There's already something pure-python [0] and something pure-go [1], and I'm pretty sure the same exists for other languages.

Oh and by the way, the pure-python I linked to is used for bup, a backup tool that stores its data in git format. Because it's extremely efficient.

[0] https://github.com/bup/bup/blob/master/lib/bup/git.py

[1] https://github.com/speedata/gogit

ash · on Aug 1, 2014

> as shown by github's experience [0]

Missing link? bup project is not related to github.

rakoo · on Aug 4, 2014

Woops ! I was talking about this one: https://speakerdeck.com/tanoku/my-mom-told-me-that-git-doesn...

baldfat · on Aug 1, 2014

Linus = Genius. People who take his personality first miss the man that really is more like the public persona of Steve Jobs then the actual Steve Jobs.

jeremysmyth · on Aug 1, 2014

Did you read the article? I don't believe the article is "personality first" at all.

There's little in there that highlights his personality aside from a throwaway comment that the author believes they're both somewhat arrogant. In fact, in all the Linus quotes in the article there's not even a single shred of arrogance.

wirrbel · on Aug 1, 2014

the article does not claim that they were arrogant in this discussion. It was very much not an article of the "Linus-Torvalds-is-rude" kind but more of a comparison of visions.

In a way, Torvalds showed with git that he is a good software engineer by putting together established techniques to form an excellent "product". He did not get side-tracked by reinventing the wheel but focused on a useful feature set and a performant implementation. A very good job indeed.

baldfat · on Aug 1, 2014

I actually never said anything about arrogant BUT the author did in his introduction. I only said "Personality First"

dpark · on Aug 1, 2014

Was there actually any doubt about Linus's software engineering ability? He created and now manages the most successful kernel in the world.

baldfat · on Aug 1, 2014

Did you read the first paragraph?

Thanks for keeping up the down votes for what I point out something that is clearly in the article BUT someone decides I am stretching something ~~~

"Now, I've never had a particular liking for either of these personalities, although I've had to recognize that they're very clever individuals. Both of them have been known for occasional demonstrations of arrogance."

baldfat · on Aug 5, 2014

Another unnecessary down vote.