In praise of git’s index

pilif · on Jan 25, 2010

using the git index (and of course "add -p" and "rebase -i" which are closely related), I managed to increase the quality of my commits by a huge margin.

What is the value of that extra-work to make commits "nice"?

It makes everything related to the history fun: merges, blames, sending around and applying patches or even just browsing the history.

But it's not just the aspect of fun: Having a clean history where each commit does one thing and only one thing saves you huge amounts of work when you are trying to find a bug in the code or when you are trying to understand a change another person made.

I can't provide numbers but it's my gut-feeling that 5 minutes of additional work while commiting can easily amount to 30 minutes of effort saved later on - at least for the person having to sort through the history doing merges or changelog updates - which in this case happens to be me.

Of couse I got really quick at commiting even huge amounts of work in the form of really small localized patches - it took me months though to get quick and error-free), so what takes me 5 minutes might take a beginner to the concept of the git index 30 minutes or more - but this is just the same with all new concepts of development that you are learning.

I would never want to go back to the ugly world that is the mashing-stuff-together that is subversion for example. I even get an uneasy feeling the rare times I do a "git commit -a".

timf · on Jan 25, 2010

"Having a clean history where each commit does one thing and only one thing saves you huge amounts of work when you are trying to find a bug in the code"

Especially when employing 'git bisect': http://www.kernel.org/pub/software/scm/git-core/docs/git-bis...

pilif · on Jan 25, 2010

right. I wanted to add that after writing my comment and before posting it, but I got distracted and then forgot.

Thanks for the pointer.

(small anecdote about git bisect: I was talking about "git bisect" at work and a comment of a coworker clearly not interested in the marvel that is git bisect said: "bisect my ass". I probably don't have to explain you how often we are now using that phrase in various non-git related contexts)

apag · on Jan 25, 2010

Good comment. I just want to react to the last sentence.

I use `git commit -a` plenty and without compunction. There are many times where a tweak or very simple feature doesn’t involve any exploration, and in those cases I rarely need the elaborate staging dance to make a sensible commit. I review the patch and find there is no need to break it down… why go through the index in that case?

The index is just a tool, and so is `-a`. Judge their use, not the tools themselves.

(Of course, sometimes I realise belatedly that a simple feature commit is not as straightforward as it seemed. Well, it would be belated if we weren’t talking about git. Rebasing (interactively, in particular) saves the day in that case. Once again: git goes out of its way to get out of mine. This is really the central point.)

sorbits · on Jan 25, 2010

my gut-feeling that 5 minutes of additional work while committing can easily amount to 30 minutes of effort saved later

I don’t need to rationalize being anal about my commits, of most I never visit again ;)

pilif · on Jan 25, 2010

YOU might not. But you might actually be nice to your coworkers (or open source project members) that do.

It's just a matter of basic politeness to others.

And in fact, I would reject a patch to any of my code out there that doesn't clearly separate different things into different patches.

Too many times in my SVN days I witnessed strange bugs appearing in some commits that otherwise only contained whitespace changes labelled with the all-famous commit message "fixes".

Never again.

sorbits · on Jan 25, 2010

Not sure why I was downvoted and get these replies.

I wasn’t being sarcastic, I do spend lots of time making my commits (and everything else for that matter) ready for inspection by her royal queen — I know others with this same behavior, and my comment was merely meant to say “don’t rationalize it” — I do it to feel good, well, to not feel bad (about “leaving a mess”) is probably closer to the truth.

bdr · on Jan 26, 2010

I think people interpreted your comment as saying "You won't look at most commits ever again, so don't be so anal."

dasil003 · on Jan 25, 2010

However when one causes a bug that you discover a few months later, being able to isolate the commit quickly could save you hours or days.

apag · on Jan 25, 2010

That’s true, most of the time it doesn’t really matter.

But it’s like properly formatting and indenting your source. Any one particular instance of sloppiness is irrelevant, but in the aggregate, the care you put into it really makes a difference in the feel of your works.

wrs · on Jan 25, 2010

One great thing about Git is all its support for using source control as a communication medium, not just a history of snapshots. So often, a source control log is like a landfill you really don't want to have to dig up. Git commits (done correctly) are intended to be read by humans. If you don't get this, then you don't get why things like the index are important.

apag · on Jan 25, 2010

Yes!! Exactly. Thank you for putting in words something that was too hazy in my mind to even think of expressing it.

Would you mind giving your name if I add this to the article as a quote?

seiji · on Jan 25, 2010

hg version: enable the [record] extension then do 'hg record': http://mercurial.selenic.com/wiki/RecordExtension

Also, please never use "f.ex" to mean "for example" ever again.

ramen · on Jan 26, 2010

He didn't. He used <abbr title="for example">f.ex.</abbr>. You know, an abbreviation.

apag · on Jan 26, 2010

OK, Mr. PickyPants, I went and fixed my usage. Happy now? :-)

(Thanks for the nudge.)

vdm · on Jan 26, 2010

What happened to e.g.?

jemfinch · on Jan 26, 2010

Alas, the Romans took it back.

herdrick · on Jan 25, 2010

This will stash the changes you see using git diff, but not the ones you have staged, so it will leave the code on disk exactly the same as the index.

"on disk"... I guess he meant in your working directory of code?

apag · on Jan 25, 2010

Yes, that’s what I meant. I tried to keep the terminology simple (there were several other places where I avoided jargon) because I consider the article’s target audience to be people who aren’t already deeply familiar with the subject.

herdrick · on Jan 25, 2010

Thank you for this great post. It and this: http://news.ycombinator.com/item?id=1063198 have convinced me to switch to git.

100k · on Jan 25, 2010

I hate the git index but his description of how conflicted merges work with git (only showing diffs of the conflicts) is compelling.

utx00 · on Jan 25, 2010

one can do hte exact same thing with hg queues.

Deestan · on Jan 25, 2010

Not really. In my subjective(+) opinion, Hg Queues are designed to work around the suboptimal design of branches in Hg.

Mucking about with a lot of small commits is a Good Habit and very useful when working from two different computers, but makes the Hg repository log terribly noisy. Eg. it invariably results in several "Oops, I forgot to add imgs.idx in the last changeset." or "This does not compile, but I committed it for backup purposes because I'm updating from Vista to Win7. Please check out the next changeset if you want to compile.". Git naturally supports squashing of commits to solve this, while Hg sans Queues does not. Git also allows you to kill off experimental branches that lead nowhere, while in Hg they live forever unless you make the "branch" in a queue ecosystem.

I will also pre-emptively counter a counter-argument which tend to always come up when I talk about this: "Swallow your pride; allow people to see your mistakes.": It's not about that; it's about keeping the repository history somewhat readable. Next week, someone on my team may have to look through all my changes for the code review, or in a few months when I'm on holiday someone might want to know what all the rework between versions 2.5.2 and 2.5.3 was so they can double-check that the release notes have been updated. Leaving these people to wade through piles of "Ooops"-es and half-commits is just sloppy and unprofessional.

(+) Disclaimer: I use Hg extensively at work, and Git extensively at home. I am frequently branching, merging, cloning, rebasing, squashing and generally mucking about in both, so I believe I know them fairly well. I prefer Git by several orders of magnitude.

Vitaly · on Jan 25, 2010

we were working with hg for a long time and had a very elaborate workflow with queues. When we finally switched to git we missed our patch queues a lot. fast forward today, I don't even remember exactly why did we need them as much :).

Today we do all the same only better with git.

If I had to draw parallels I'd say hg's queues are used in a workflow like topic branches are used in git. its a way to 'edit history' that is basic in git but requires patch queue in hg (and don't really comes close in terms of functionality and easy of use). and just like in hg you can't change once you commit your patch, in git you can't (or rather shouldn't :) change once you merge into shared branch.

snprbob86 · on Jan 25, 2010

This is the sort of stuff I'm curious to hear. I have rudimentary experience with both Hg and Git. I have a project being managed in each and have found them roughly equivalent in basic use. I'd love to hear more details and stories from people who actually know both really well.

Actually, in general, I'd love to see more comparisons (not just VCS) that assume you understand the problem space and talk about nitty gritty details. I feel like I would learn more that way. Frequently, I find the same simplistic surface level arguments over and over, but rarely deep competitive analysis. I guess that kind of analysis is just harder and can only be done by people who have idiomatic expertise in both (and idiomatic experts are already rare).

utx00 · on Jan 26, 2010

so if you track project state with branches (for say versions) and you need to apply 'new feature' to all this n branches, does that mean n merges for your team? not for ours.

in all fairness, there is stacked git. go crazy.

dlsspy · on Jan 26, 2010

Cleaning up history to be readable, clean and represent the actual net change to the project instead of every dumb error I made is gold.

A properly managed history is a huge difference when it comes to dealing with code submissions. I tend to not take lists of changesets that include a bunch of "oops" because they take me a long time to figure out what the user actually did.

In some cases, I have really long lists with really small net changes but a whole lot of experimentation in the middle. I can't take those. They rewrite the entire history of giant chunks of the code to put this guy's name on them because he changed something and then changed it back and then committed half a change and then committed the second half, etc...

You say, "but this is fine, it's how he worked!" Then you have to track down a bug and see that it occurred in the middle of this "experimentation phase." That's when you realize that you don't care at all about the method of discovery, but really wish you could just hear what the discovery was in one changeset that is as small and clean and well-thought-out as possible.

durin42 · on Jan 25, 2010

I assume you're using histedit, rebase, and/or other tools and not just mq for your hg history munging needs? mq is a great tool, but it's not always the best one.

Also, you might find bookmarks useful if you don't use them.

viraptor · on Jan 25, 2010

hg closes branches with `hg ci --close-branch` - or did you mean something else?

Deestan · on Jan 26, 2010

Yes, I mean something else. :) When "--close-branch"-ing or merging back to "default" are closed and hidden in the branch list, but still present in the revision tree, and you get warnings if you try to create a new branch later with the same name. In Git they are just gone, as if they were never created. (Note that you have plenty of opportunity to restore them before garbage collection if you deleted one by accident.)

chousuke · on Jan 26, 2010

Branches don't really "disappear" in git, though. You're probably already aware of this, but for everyone not familiar with how git works:

Named branches in git are just moving references to commits. If you delete the pointer, the branch doesn't cease to exist, as it is present in the repository 'tree'. If you know the commit that was the head of the branch, you can easily create a new reference to it. The branch might eventually get garbage-collected if there is no other named ref that depends on it.

For example, if you merge "feature-foo" to "master" and then delete "feature-foo", the name is gone. However, the branch is still there (clearly visible in git-gui), and will not be garbage-collected because "master" depends on it.

The "problem" with hg in my opinion is that it attaches more meaning to the name of the branch than is necessary.

Of course, git also allows you to destroy commits, and destroying ones that you have shared with other repositories may lead to complaints from others developers. But that's a social issue; git will not (and never should) dictate what you do with your commits, it just tracks them.

utx00 · on Jan 26, 2010

if you have topic branches in hg you can also strip them provided you didn't push them to a public repo.

utx00 · on Jan 26, 2010

one wonders how this works then, if Hg branches are so sub-optimal: http://hg-git.github.com/

you can re-write history in mercurial too - with the same obvious implications for both.

one thing that did go away with Hg was the need to merge unfamiliar code because someone upstream did a git-rebase.

undees · on Jan 26, 2010

It maps git branches to hg bookmarks (http://mercurial.selenic.com/wiki/BookmarksExtension) instead of hg branches, because bookmarks turn out to be a better fit.