Hacker News new | past | comments | ask | show | jobs | submit login
Git Reset Demystified (progit.org)
126 points by schacon on July 14, 2011 | hide | past | favorite | 54 comments



Everytime I see an article like this, of this length, all to explain a single git command, I get the feeling that git (and all other decent SCM tools at that) are way too complex. It's well written and well illustrated, but I feel like I shouldn't have to read that much just to learn a basic feature of a tool that's supposed to exist to help me. Is this just me?

Of course, the feeling is strengthened by the fact that I still don't completely grok git, but shouldn't a tool that's there to help me be easier to grok? I understood dropbox in 1 minute. I learnt TDD in 3 hours. Why should (distributed) version control be so much more difficult?

I don't know how, just wondering if I'm the only guy who feels like this.


For me, git is remarkably simple. Its underlying data model is all that you need to understand (and there are only about four major types of object).

Everything about SCM then pops out. I found myself needing tools to manipulate the data model in some way, and a suitable tool always seems to exist.

It's kind of like the Grand Unified Theory of SCM. With such a simple core, everything about SCM just falls out.

I prefer to understand everything in this way so git is perfect for me. But I can understand that it's probably far trickier for people who prefer to learn top-down by starting with workflow and processes rather than from first principles from bottom-up.


In this case it's a difference in levels of understanding. From the beginning of the article: "I never strongly understood the command beyond the handful of specific use cases that I needed it for." So in that frame of mind git-reset is very simple. "Oh crap, I messed up my working dir and everything's messed up! git-reset --hard!"

The purpose of the article though is to improve understanding beyond that and at an intuitive level. It's the difference between saying Bayes Theorem is p(h|e,c) = p(e|h,c)*p(h|c)/p(e|c) and linking to http://commonsenseatheism.com/?p=13156

The thing I like about git is that yes, as a whole it's pretty complicated, but it's really not that hard to get up and running quickly especially if you use github. Then it's a gift that keeps on giving as you learn more about it while also becoming more proficient. Kind of like vim in a way, which is notorious for its wall of a learning curve, but if you just want to treat it as a normal text editor you can tell it to start up in insert mode and use gvim, then proceed to learn from there.


> From the beginning of the article: "I never strongly understood the command beyond the handful of specific use cases that I needed it for."

Note that this comment was written by the author of one of the most popular git books (Pro Git)! His dead tree book is #88 among Amazon's Software Development books. His Kindle book is #16 among Amazon's Software Development Kindle books.


You're not, and I think this is where a lot of flamewars between Git and Mercurial users and between DVCS and centralized VCS users start, so I while I don't want to feed the fire, I will only cautiously recommend peeking at Joel Spolsky's Hg Init tutorial, and see if Mercurial is a bit more natural for you, since a lot of people seem to think it's simpler (I have no opinion, mostly because I haven't used Mercurial much).

At the end of the day, for me it boils down to a comment by Xiong Chiamiov, made on one of Mike Taylor's rant-y blogposts about Git, which was much discussed on HN a while ago,

  "I use it because the benefits outweigh all of the things that you mentioned."
Yeah, it's complicated. But a lot of people think it's worth it.

edited to add links to the Reinvigorated Programmer posts, which I recommend reading because there's a ton of good commentary and suggestions for newcomers, even if the discussion got a bit heated:

http://reprog.wordpress.com/2010/05/10/git-is-a-harrier-jump...

http://reprog.wordpress.com/2010/05/12/still-hatin-on-git-no...


Thanks for the advice. However, I felt exactly the same with Hg. In fact, the main reason I have for currently using git and not hg is github.

I'm convinced that indeed it's worth spending more time to grok [git/hg/bzr] and really get the power out of it, but I just can't shake the feeling that somehow, it should be possible to make something with the same benefits that works much easier.

I was hoping that maybe Kiln is exactly that, but from their promo-page all I can see is a rant about how cool DVCSs are, and not much about how Kiln makes Hg easier.


Here's the key to git: embrace the fact that you're going to learn a new tool. Learn how to use it and play around with it. Once you're comfortable doing the basic things you would do day to day like create a repo, commit, push, pull, branch, merge, etc. then dig in deeper. Now you start reading about how git works under the hood to demystify the things you already know.

Articles like "Git for Computer Scientists" are great at this stage. Once you know how git's incredibly simple model works you are well poised to learn more advanced things. Rebasing can be hard for some people to understand and it's even harder if you don't know what commits and branches actually are.

After that you need little instruction and should be able to help yourself. It's totally worth it and it's not as hard as you think!

edit: Plus once you know how it works you get to skip big parts of articles like this and just skim around to pick up what you don't already know. It's pretty light reading actually.


I use Git for the same reason, although there's also Bitbucket and others for Hg. I don't want to act as an apologist for Git, because I think the user-interface could use some work (although I gather it was much worse earlier). I think at the end of the "Git from the bottom up" tutorial the author wishes the interface were better, but believes that Git will be the base upon which awesome things will be built. I don't know if anyone has tried to create a kind of porcelain-above-the-porcelain of Git, but that might be interesting, although from my experience with 3rd party Git GUI tools (Tortoise, EGit), many of which abstract details away, it can also be infuriating if done poorly.

I don't think the current DVCS are the best thing ever possible, but I do think Git one of the best things we have now.


Kiln is basically a bitbucket competitor. It doesn't really change Hg(they do sponsor development ), what it does do is provide a nice web front end and provide the rest of the tools a project would need(through fog bugz) like wiki, bug tracking, scheduling etc.


Git is fundamentally pretty easy, but its terminology is confusing. Maybe this glossary helps:

commit: a snapshot of your files

tag: a reference to a commit

branch: a moving tag

HEAD: the current branch

index: the next unfinished commit

git-add: copy file(s) to index

git-commit: create a commit from the index

git-checkout: copy file(s) from a commit and redirect HEAD

git-reset: redirect current branch to another commit

git-revert: create new commit, which is the inverse of another commit

An interesting thing is that "the next commit" is an object you can work with in git. This concept is new for svn users. With git you can puzzle together a snapshot before it becomes a "real" commit. So mistakes like "Oops, i should not have commited that file" or "Oops, i forgot a little fix in another file" can be handled with git.


TDD isn't a tool, it's a process of developing software.

This is just the internal stuff, you DONT need to know this using git on a day to day basis. I use git for all my projects and most of the time, i mainly just type `git commit -a -m "lol message" && git push`.

Do you need to know everything about MS Word in order to become productive with it? No. Apply the same concept to Git or any software really.


True that, and that's what I do (well, I don't type that command but I right-click in Explorer and choose Git Commit, but I see your point).

I guess the difference is that if there's something that Word can do that I want to figure out, there's good Help and I can get pretty far by scanning the menus. With git, good help exists too (in the form of blog posts like this article), except that the concepts are significantly more complex than those involved in making automatically numbered chapter headings.

In short, to my experience figuring out how to do something non-trivial with Word is exploration. Figuring out how to do something non-trivial with Git is more like studying an advanced CS class.


I agree with you. It seems like the basic concepts are neat and clean, but that the interface to it, and the vocabulary, is counter-intuitive. The article really gives me the impression that git-reset can be used to do several vaguely related things, with an inadequate terminology (--soft, --mixed, --hard?).

Besides, it doesn't help that the first commands you learn about git (like git add, git commit -a, etc.) actually hide most of the complexity in a treacherous way (especially since they do not advertise the existence of the index).

It's probably possible to learn git efficiently by approaching it as something abstract, not as that stuff you want to be quickly productive with. Sadly, you probably can't realize that unless you already wasted your time on the misleading tutorials.


It is more that the git API uses terms that have been used in past VCSes in different ways, so it is a matter of helping new users to unlearn their original mental model from older VCSes and to learn the new model. I am mostly convinced that the new definitions of terms are more accurate.

"git reset" in one sentence: git reset allows you to move the HEAD's branch tag to any other git revision. (It is easiest to think about git in terms of labels moving around a tree of nodes.)


There you go. What's a "branch tag"? What do labels do in "trees of nodes"?

Mind, I don't mind tackling this kind of complexity when I do algorithms for getting something done. I'm perfectly capable of understanding what you mean if I'd really invest the time in it that I should. But why do I need to do such (relatively) complex yet abstract thinking when I want store a README, undo a mistake and then share it all with a colleague?


I used to be right there with you. I knew svn and I just wanted to port that knowledge over to git and start working. The problem was that the git interface is similar enough to svn that it lets you use it almost like svn (add, commit, status, checkout, etc) and leads you to believe that it is fundamentally like svn. So whenever the first non-svn thing comes along it suddenly seems way too complicated. It seems like it's svn except with a whole lot of extra complicated crap added on. I told myself I'd eventually climb this gigantic-seeming learning curve, but I kept putting it off because it looked like such a huge obstacle.

One day I finally sat down and starting going through the Pro Git book. After reading about the fundamentals of git I finally realized that this is ultimately no more complicated than a first year data structures class. It's just a stupid directed graph. All this wasted time scratching my head, only to find out that it's just a directed graph with labels attached to the nodes called 'branches'. All the crazy git commands with all the crazy options are just hacks that let you look at and fiddle with this stupid graph. Ultimately when you want your repository to be in a certain state, you first figure out what you want the graph to look like and then you use whatever git commands you want to make it so.

It's hard to explain how fundamentally simple it is. The actual interface doesn't make it seem like it's simple, and all the terms that everything has doesn't make it seem simple, but it is all a bunch of tools and terminology built on top of a simple data structure. Unlike many tools, it is easier to understand the internals of git than the externals. But once you understand the internals, then you can practically speed read an article like this in the same way you could speed read an article explaining for/while loop syntax.


> Unlike many tools, it is easier to understand the internals of git than the externals.

But why should that be? For example, the git docs use different terms for the same thing (e.g. cache, index, stage).

Also, the git tools overload the same command for different purposes (e.g. git add tracks new files or stages pending changes from existing files; git reset can unstage pending changes or revert committed changes). These are different functional procedures that happen to share implementation details.


Don't use "git reset" to revert committed changes. Well... do, but be careful... The word "revert" is problematic here, and I suppose that "reset" is a bit awkward, too.

In git, a "revert" means you are creating a new commit that is the reverse of the commit you want to undo. The two commits exist and effectively cancel each other out. The danger here is potentially thinking "git reset" will revert a committed and pushed change (sometimes people push early). In general, this is probably not what you intend.

"git reset" moves branch pointers around. So when you "git reset HEAD^", you are really saying you want to move your current branch and HEAD (the thing that indicates where you are in the tree) to the prior commit. The current commit still exists but can be ignored.

Speculation now, but I think it is right... This same understanding applies to the "unstaging". The staging area is another tree node, and when you say "git reset HEAD" on the staged file, you are moving it back into the HEAD state.

One note on unstaging and other resetting... You can reset --soft or --hard. --soft (default) means that you want to dirty up your directory with the differences between the file's current revision and the reset revision. This is useful if you are cleaning up an unpublished branch (using reset to undo).

"git reset" could be called "git move-my-branch-tag" and make more sense.


btw, the term "revert" has a pretty standard usage in most version control software, including Subversion, Mercurial, Perforce, Bazaar, Darcs, Monotone, Fossil, and AccuRev. Why did git not only invent a new term for discarding uncommitted changes, but reuse "revert" to mean something different?


Good question for the devs. "git apply-reverse-patch" was probably too long for a fairly common operation? I suppose when you are trailblazing with a new paradigm, you are allowed to redefine things. I don't let those kinds of things bother me, personally; I just adapt my thinking to the new thing I am learning. Helps me to learn new languages easier.


It's kind of a silly guessplanation, but I have a feeling that Linus Torvalds made the call and it stuck. I don't know anything about the people who actually maintain Git, though.


But they aren't fundamentally different procedures. They are different use cases of the fundamental "change tree" procedure.


> What's a "branch tag"? What do labels do in "trees of nodes"?

I'd recommend "Git from the bottom up" or Scott Chacon's "Getting Git" screencast. Both tackle Git for newcomers by describing the underlying data structures. Once I had a cursory overview of what Git was actually doing under the hood, so much else clicked for me.


> why do I need to do such (relatively) complex yet abstract > thinking when I want store a README, undo a mistake,

You have a legitimate point, and I think it is because your level of understanding, your need to know, depends on the actions you want to take. At first, it is sufficient to know "git add <new file>" and "git commit -a". You will be challenged as soon as you need to do something beyond the very basic linear, like "git push" and "git pull". Branching, merging, sharing... Version control is just not dirt simple. There is fundamental theory to it. If you work with others, lacking that theory will lead to pain in any VCS (starting with a fear of merges and conflict resolution).

These things that you consider simple actions begin to add up, and at some point, your mental model needs to adjust in order to grasp the relationships amongst these things. Your model necessarily becomes more abstract.

I am not convinced that that means it is more "complex", at least significantly enough to be a real barrier to moving forward with git.


Sure, but there are already "tools to help you" in this realm that are easier to grok.

Early on, developers saved old versions of code via manual copying. That was really easy to grok (Label floppy with "1983 Aug 12", copy, put in box). But it sucked.

So developers invented things like SCCS or RCS, which stored metadata. But they were harder to grok, as now you had to understand the idea of files having history (and branches), and needing to be "checked out" using tools entirely different from standard copying, and being "locked" against change. But they helped, even though they still kind of sucked.

So hackers invented CVS and Subversion, which worked on trees instead of files, used implicit locking and merge algorithms, and generally made life better. But they were yet harder to grok! Now doing regular development meant that occasionally you'd get collisions: you needed to understand the 3-way merge algorithm just to write code. And they still sucked.

So Linus gave us git. Which adds a bunch of new hard-to-grok abstractions like the index and commit ID. But it fixes a bunch of problems too, even if 10 years from now we'll agree that it sucked too.

So pick your point along that spectrum and choose your tool to match your comfort level. In short: your complaints aren't anything new, SCM systems have always been confusing to learn.


Why did everything suck? Are we even addressing the problems as we trudge along your linear development path? I think what your parent was getting at is that increasing complexity of SCM software is orthogonal to the problem, which hasn't really been defined well.

It's easy to say that something sucks, but it's harder to say why.


It can take a long time to understand something simple but alien. The advantages of familiarity are short-term; the advantages of simplicity are forever.


I personally feel the problem is just the terminology doesn't really match between tools. Coming from SVN, I think of something I want to do and I think of the SVN command for it, but it turns out the Git command is a totally different name, so I spend 10-20 minutes being confused- especially when there is a command of the same name that does something different.


Git is more comparable to Vi or Emacs than Dropbox or TDD. Or compare it to grep or sed.

The article is intended for deep understanding, not as a tutorial.


Git is not wrong by itself, there's just an impedance mismatch between it's model and a usable interface model. For most purposes, the index should completely hidden.


I would respectfully allow that anyone who can't comprehend what the index is for with the aid of two paragraphs and a diagram should be looking for an alternate career. The index is important. It's one of the great things about Git; without it, you wouldn't be able to create commits from a subset of the difference between your working directory and HEAD. It also has a confusing name, but here we are.


> It's one of the great things about Git; without it, you wouldn't be able to create commits from a subset of the difference between your working directory and HEAD.

I never understood that. I can do this just fine even with TortoiseSVN. Just click "commit" and select the files you want to include in this commit. I don't see how I need to keep yet another data structure / piece of "state" in the back of my head for that. I definitely don't see why I have to look for an alternate career because of that.

Am I missing something?


The benefit that the index gets you in such a situation is that sometimes you are working on two unrelated changes at the same time, but they both touch the same file. Git (using the -p flag to the add command) lets you interactively select portions of a file to add to the index.

The interface is pretty simple: it just shows you each small piece of diff to the files you are adding, and you say whether you want to include that bit of diff in the index or leave it in the working directory. Alternately, it can drop you into a view of the diff in the editor, and you can add or modify diff lines as you please.


Honestly though, if you're doing that you should be using two different branches. Which is another thing git is wonderful for.


Yes, you are; SVN doesn't allow you to commit parts of a file -- it works at the file level. Git allows you to commit an arbitrary subset of changed lines from whatever files you've changed in your working copy. It doesn't track changes to files, it tracks changes to content in its tree. See here for a more complete explanation: https://git.wiki.kernel.org/index.php/GitFaq#Why_is_.22git_c...


> Am I missing something?

Yes. In git you can also do 'git add -p', which lets you interactively add pieces of a diff to the index, not just entire files.

Now, you could imagine an interface where you interactively select the pieces of the diff when committing, and I believe some VCSs do this. So, even though you were missing something, you weren't necessarily wrong. :)

But having the index can be useful if you want to build up the things you're going to commit over separate 'git add -p' sessions.


> Now, you could imagine an interface where you interactively select the pieces of the diff when committing

There's no need to imagine -- git-gui and git-cola can do this.

http://cola.tuxfamily.org/

Click on a modified file, select specific lines from the diff, right-click, and click on "stage selected lines".


> [W]ithout [the index], you wouldn't be able to create commits from a subset of the difference between your working directory and HEAD.

That's a correct argument for why the index has to be separate from your working directory. But I don't see why the index has to be separate from HEAD ... in my model, every "git add" would automatically be followed by an implicit "git commit --amend". So instead of building up your commit in the index you build it directly onto HEAD.

Since you're remotely sane, of course HEAD is a private branch not a public one (because you're perpetually modifying "history" on it). (And to start a new commit, of course you also need a command which advances HEAD by creating an empty commit.)

To put it another way, the index should be just another branch. The git commands are way too complex because they don't treat it orthogonally.


I appreciate that you should have the option to choose which part of your working directory you commit. But I think the default should be to commit everything, like in SVN or mercurial. If I need to commit selectively in those systems, I'll disable some checkboxes, type in the filenames, build a changelist or use the record extension or something. The point is that I don't need to deal with it until I need it.

Also, it seems to me that git's behaviour encourages broken revisions (i.e. compiler errors, test failures) because your working directory doesn't match what you commit. And broken revisions will interfere with bisecting.


I think committing everything in the working directory is an awful default. Few developers are disciplined enough that the contents of a working directory always make a perfect commit.

With git, how you code is a non-issue. The whole idea is that you're able to to worry about commits afterwards. The index is a wonderful tool to help you untangle the mess of code that has not yet been separated into logical pieces. Working with git is not just writing out code and then committing it. Making proper commits requires time, discipline and practice.

Part of the problem might be the mindset that a VCS is supposed to record how the development happens, but if you think about it, that does not make much sense. The actual development process of a feature or even a bugfix is often riddled with experiments, trivial mistakes, sidetracking, and other largely uninteresting issues.

Once you have thought about what is logical to record into the repository and create a commit, you can test it. Git stash allows you to put aside all other work while you run tests, and git commit --amend allows you to fix the commit until it works.

Test your commits, and you will not have broken revisions or unbisectable history.


Yes, this is legitimate criticism of git's philosophy.

However, in practice i have seen those broken revisions in svn just as well. People forget to add untracked files or they skip the final unit test run. Basically, I believe your criticism is theoretical and no problem in practice.


Git requires a bit of a learning curve to use but it's gotten far simpler than subversion for me. I used subversion for years before using git. I refuse to go back.

If you always want to commit everything use `git add . && git commit -a`.


For me, I could learn git no problem, write wrapper scripts for some things, etc... But I'm not explaining it's idiosyncrasies to other people on my team.


It's also vital when merging. Agreed, too bad about the naming problem. Index, cache, staging Area... how many names does it need?


In a sense, the index is an acknowledgment of the fact that the operation of "creating a commit" should be a two-phase process, similar to a database transaction.

Phase 1: Choose which parts of the working directory changes should be in the upcoming commit.

Phase 2: Actually commit those changes.

The separation is not entirely clean in Git. While this is very appreciated for convenience (e.g. git commit -a), it blurs the line and does cause confusion.


Indeed, the ability to work on code now and think about a commit later is freeing. As for the particular abstraction of how to represent the commit in progress... I like the working tree for that, rather than the index. It's just a better mental fit for me to unstage stuff I don't want to commit (via shelve/stash), _rerun unit tests_, do a simple diff without needing to remember the incantation for seeing what's in the index, and commit.


Scott, thank you so much for all your work on Git tutorials, blog posts, screencasts and books... I'm sure you know this already but you're a great teacher!


I second that. Pure excellence at http://progit.org/


It's ironic that this is on the front page at the same time as Linus's proposal to change to commit object format, in which he claims that the lack of generation numbers is Git's "only real design flaw" [1]. Okay, maybe he's only talking about the plumbing, but if it needs long guides like this to understand one command, I think maybe that would constitute a design flaw.

I like Git, I really do. This article helped me like it more. There has to be a way to get the benefit of that object model without all the grief at the interface.

[1] http://news.ycombinator.com/item?id=2765844


Good overview, for beginners as well as regular users.

In particular git checkout <file> which is (a) not obvious (before stumbling upon it, I was trying to find it in git-reset documentation), and (b) looks pretty innocent (as the git checkout branch is quite safe to do if you've got staged changes.)


git is an extremely powerful tool - but it's interface is far from being smooth - maybe it will come with time. For example the usual pattern of a copy command is:

copy source destination

The pattern of git push command

push destination source:destination


1. There is no built-in copy command in git.

2. What is confusing about "git push <remoterepository> <localbranch>:<remotebranch>"


Git is like that "one data structure with a hundred functions operating on it, instead of ten functions operating on ten data structures".


And that "one data structure" is a directed acyclic graph of objects, where an object is one of four kinds:

- blob (like a file)

- tree (referencing trees and/or blobs, like a directory)

- commit (referencing one tree and other commits)

- tag (referencing a commit)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: