Hacker News new | past | comments | ask | show | jobs | submit login
High-Level Problems with Git and How to Fix Them (gregoryszorc.com)
391 points by jordigh on Dec 12, 2017 | hide | past | favorite | 286 comments



One personal anecdote:

In a decade of using Mercurial, I've managed to get a repository in such a confused state that I had to blow it up and start from scratch just once. In the same time, I've had to do the same for git repositories at least 5 or 6 times--and I never really used git all that much. Even nowadays, where I'm more or less forcing myself to use git [1], I'm incredibly hesitant to try any sort of complex git operation because of the risk that I might lose my in-progress commits.

While the problems of git's UI is well-known, I don't think it's as appreciated that git's complexity generally discourages people from trying things out. I'm never really sure what could cause me to lose unpushed commits or uncommitted working files, especially with git's predilection for reusing the same verb to do slightly different operations for different concepts. In stark contrast, it's more difficult to get Mercurial to chew up your work without telling you [2], and it's easier to explore the command lists to see if there's something that can do what you want it to do.

[1] The repositories I'm working on involve submodules, which hg-git is exceptionally poor at supporting well. Even so, the list of git alias commands on the whiteboard is entitled "Forcing myself to use git before I rage-quit and switch to Mercurial."

[2] I have done it, though. Worst offender was discovering that hg qrefresh <list of files> causes only those file changes to be committed, and files not mentioned are decommited to working-directory-only changes... which hg qpop happily deleted without warning me, causing me to lose a large patch. That was when I put the patch queue directory in version control.


> I don't think it's as appreciated that git's complexity generally discourages people from trying things out.

For me, at least, the underlying model actually makes it easier to try things out. As long as my work is in commits and I have refs that point to those commits, I can be pretty sure that I'm not going to lose anything. I also take a fair amount of comfort in the fact that (unlike, say, SVN), any remote actions can be deferred until I know things are correct locally.

This sort of robustness in the data model makes it a bunch easier for me to forgive the (very real) usability issues in the command line UI. (And, after years of work, the command line is getting more familiar, aside from some of the more complex rebase operations.)


> I don't think it's as appreciated that git's complexity generally discourages people from trying things out.

Personally, I'm not shy about recursively copying the entire repository on disk to run potentially destructive experiments in a duplicated environment.

Also, once you understand how to pull things out of the reflog, it's usually not hard to restore to a sane state.


> I'm not shy about recursively copying the entire repository on disk

Agreed.

> Also, once you understand how to pull things out of the reflog,

That, and I'll also dump a git log of the last couple dozen commits into an editor buffer so that I can hang onto the commit ids. Until a GC cycle occurs, you don't lose the commits even if they aren't reachable through normal paths.


`git reflog` gives you the historical list of SHAs that you've pointed HEAD at; even if you destroy your branch with a rebase, you can still see what commit your branch pointed to before you started the operation.


Yeah... I just find the printed log output a bit easier to read (at least for the few times I need to do this sort of thing). '@' syntax falls into that same basic category - powerful, the better solution, but nothing I use enough to bother mastering.


I agree.

I want a VCS that does NOT try to hide how it works from me.

I love that git gives me a) a bag of commits forming one or more rooted trees, b) symbolic names for commits/branches that resolve to individual commits in that bag.

That's such a simple model.

Remotes. Commits. Branches/tags. That's it.

Merging and rebasing merely create new anonymous branches in the bag of commits, then move a branch name (if applicable) to point to the new HEAD. Done. Trivial.

The reflog is just a local log of actions taken, and can be used to quickly recover from missteps: just find a HEAD you want as your branch's head, and then git branch -f branch <commit>.

The index/staging area are very, very useful, mostly because of git add -e, which is extremely powerful. This lets me split up my commits usefully.

Just about everything else is stuff you can figure out as you go.

The only thing you can ever lose in git (short of doing rm -rf .git) is workspace content that you have not added to the index or committed. Stay away from git reset unless you know what you're doing with it, and all is good.

Being able to think of the state of the repo as a tree of anonymous commits, with a name resolution table for resolving branches/tags to commits, makes doing complex tasks trivial for me. I often rebase in detached head mode and nothing goes wrong; when I'm done I manually move/create a branch to the new HEAD -- I can do this because the tree-of-anon-commits model is so darned simple.


Now, can that be explained to a git newbie in under half an hour, to get them up and running?

IMO, the scary UI/UX of git is holding back the use of version control for a lot of digital documents beyond just source code. It needn't be this complicated.


I've tried to get noobies to use SourceTree on OS/X, with Mercurial or git, and it never goes well. Version control is just hard the moment you want to do anything other than linear commits. The moment you want to undo something partially, it gets ETOOHARD on most non-developers, and especially people like my mom.

On the other hand, when explaining the git model to sysadmins and developers, not sugar-coating it has worked best for me. I explain the bag-of-commits+nametable design. I explain the workspace and staging area. I show how one does things. This works for me.

Maybe git's UI "needn't be this complicated", but I do find that Mercurial's heavy-duty branches lead to hell very quickly and it's always difficult to recover -- and Mercurial is supposed to be simple! The lesson is that simplifying isn't easy, and not every simplification works well generally.

With git I can simplify in that I show users a small amount of abstract concepts (bag-of-commits, nametable) and a small set of common usages and then let them learn by experience and osmosis as they reach out for help when they stray past those usages. I can't do this with other VCSes. At some point it just becomes painful.

One of the nice things about git is that I can push users to make very linear history. This is nice because diving into the history of a tangled mess of merged branches is not easy, and it's actually very confusing to non-professionals. But even for professionals, linear history is far superior to a tangled mess of merged branches, and only git really gives us that power today.


> I'm never really sure what could cause me to lose unpushed commits...

Effectively nothing. By default, old, unreferenced data is only garbage collected after a month. If you screw up a rebase, the old commits are still there with the same hash they always had. You just need to point the branch at them again.

> ...or uncommitted working files

That's fair. If I'm unsure, I just stash or commit first. Anything that's in git can be retrieved.


Sometimes I find myself doing "git stash; git stash apply" just so I have a checkpoint of what I'm working on. That plus "git stash list -p" lets me go back through tons of non-checked in junk to find (say) that one line of debugging code that turned out to be pretty useful…


You can do small temporary commits on a branch, and once you want to do an actual commit which will then be pushed, you go onto the master branch and `git merge --squash <work-branch>`.


This.

Git makes using branches very easy and it is fairly quick with most operations. Once you understand what the commands are doing, you'll see that unless you do something specifically (like reset --hard with uncommitted changes, checkout a modified file etc.), you'll never lose work which has been committed in some branch or the other.

I tend to just do "in-between" commits very often and rebase to reorganise the commits (reorder, squash, reword) when I hit a (micro)milestone.

If your work has been committed and the repo was not just garbage collected, it is going to be easy to retrieve it from reflog even if you did something nasty with the branch and it appears that your commit has disappeared.

Git is the piece of software which I find very good in terms of what it does but it does it behind the worst user-interface I have ever encountered.

EDIT: minor clarifications and punctuation


Microtutorial for git:

`git clone https://repository.url` (download repository)

`git checkout -b issue_XXX` (create new branch with the name 'issue_XXX')

`vim README.MD` (edit a file)

`git add README.MD` (mark the file to be committed, `git reset README.MD` to unmark)

`git status` (see list of changed/marked files)

`git commit -m "Add Readme"` (create commit based on marked files)

`vim README.MD`

`git add README.MD`

`git commit -m "Fix Typo"`

`git checkout master` (switch to branch with name 'master')

`git merge --squash issue_XXX` (merge branch with name 'issue_XXX' into current branch (master), but stop before committing)

`git commit -m "Add Readme.MD"` (create a new commit)

`git push` (upload changes)


yeah, I do this too.

psa: `git reflog show stash` shows the stash list along w/ with their shorthashes for easy reference


> If you screw up a rebase, the old commits are still there with the same hash they always had.

Disclaimer, if you're compelled for none other than neurotic reasons to run `git gc` periodically like me, these old commits will get thrown out. I found this out the hard way :(


Yeah, don't do that


> I don't think it's as appreciated that git's complexity generally discourages people from trying things out.

The irony here is that the model that git internals use aren't really complex at all; it's the poor porcelain and poor naming that makes it seem complicated.


> In a decade of using Mercurial, I've managed to get a repository in such a confused state that I had to blow it up and start from scratch just once. In the same time, I've had to do the same for git repositories at least 5 or 6 times--and I never really used git all that much.

I have used git for a long time, but never arrived at such a state, apart from when I was converting existing SVN repo to git (where starting the process anew after 10 minutes of experimenting was faster than restoring refs). How did you manage to do it?


How did you manage to do it

Same way everyone else manages it, including myself: abject ignorance. You and I (as a now more experienced git user) know that it is extremely unlikely to actually be in a state where blowing away the repository is necessary. Like you, I can’t think of how one would pull that off. That’s why I tell less experienced users “if you’ve committed, even just locally, you can’t lose your work without really trying. If you get in trouble, don’t use the ‘-f’ switch, and come get me, we’ll get it sorted out.

But that was not me nine years ago as I blew away the repository again because I got in a confused state (me, not git). And be honest, that was you nine years ago, too, before you wrapped your head around git.


I'd like to think I know git pretty well... I understand the internals, i've written cli tools that utilise all the plumbing commands, I understand the different types of objects merkel trees, and various form of references.

Yet, once when I had a power failure by chance the second I fired the commit trigger... I was pretty hopeless at trying to clear up the mess of objects it exploded everywhere, I couldn't be bothered and just re-cloned the blasted thing, hope it never happens to me again.


I very much doubt that this is a Git issue, but suspect that it is instead a file system/OS/hardware issue. The problem is that operating systems generally offer only very limited guarantees about the atomicity of the bits actually being physically stored on a device (at least guarantees that can be used with reasonable efficiency).

It should not normally be a problem, but a power failure just at the wrong time seems the prime suspect to me; there's nothing in Git's logic that should normally allow for such a problem.

See, for example, the "Failure to sync" section of SQLite's page on "How to Corrupt an SQLite Database".

[1] https://www.sqlite.org/howtocorrupt.html#_failure_to_sync


It is a git issue... File system atomicity is not the issue: Creating a commit is a process of creating a whole collection of blobs and trees and ultimately a commit tree object, this is fundamentally the way git works. There are many external reasons it could fail half way through creating that tree of objects (objects are files) and that has nothing to do with atomicity of those external factors.

This type of incident is not irrecoverable, but it will hella-waste-your-fucking-time (FYI I really like git, but it's far from infallible)... Try it out and you will have a fun time following the trail of dangling objects.

My point is not that git is fundamentally flawed (fundamentally it's extremely resilient and elegant), but merely that "commit" being a porcelain command should provide a simple way to recover from arbitrary failure without having to dive deeply into the internals and plumbing commands (it's another CLI UI failure)... I knew what was wrong but it was such a pain to reconcile that I resorted to re-cloning, imagine what a confusing mess it would appear to a regular user.


Um..a mission critical data store shouldn't be affected by a "power failure at just the wrong time".


Sure, but the same things could happen to (say) a relational database under the same circumstances. There's nothing that you can do if (say) the hardware lies to you about having written bits to disk or reorders writes. I was citing the SQLite page for a reason. You can only work with the tools that the OS gives you.


Git is not a mission critical data store. Corruption should cause a bad day, not a disaster.


> In a decade of using Mercurial

> I never really used git all that much

Does not sound like a fair comparison ;)


It's a valid comparison for his point that Mercurial is much more reliable for his workflows than git.


I have used git a lot, and rarely, if ever, get it into a confused state. I have used Mercurial less, and I have found it easy to get things tangled up without a clear way out.

It kinda sounds like the system you know best, is the one you can handle better regardless of whether it's git or Mercurial.


> It's a valid comparison for his point that Mercurial is much more reliable for his workflows than git

To me it reads as a warning that Mercurial repos are prone to being screwed up even if the user has over a decade of experience avoiding Mercurial's traps while with Git only bold inexperienced users can screw up repos in a similar matter.


A decade's worth of experience vs beginner-level experience could explain the issue. Not saying it does, just that it could.


In response to [2], these days MQ is deprecated and there's a big push to get people to use evolve, which avoids this mess and is much safer.


Just keep a copy of the `.git` dir before you try the complex operation. If you screw it up, restore it back and voila!


No. This does not safeguard any uncommitted files.


Leaving any files uncommitted before a complex operation in git is a really bad idea (make a temporary commit if necessary). It's almost impossible to completely lose committed data, but pretty easy to destroy uncommitted data.


This. I also do a `git diff > save.patch` if there is something uncommmited that i don't want to commit at the moment. A patch file feels more comfortable to me than git stash, as i can easily view the patch files, and reapply them with `patch -p1 < save.patch`


You can show the diff with `git stash show` or also apply your patch with `git apply save.patch`


No. Commit your changes and mark it. For instance by creating a branch.

git commit -a git checkout -b before-complex-operation ...

To get back to where you were, git checkout before-complex-operation


wow.

> I'm incredibly hesitant to try any sort of complex git operation because of the risk that I might lose my in-progress commits.

correct way of learning git is every time you want to use a "complex"/dangerous commands in git, you `cd /tmp` and create a git repository there. then create some commits, branches and experiment. you will NOT lose anything, but you will learn a lot.

I did and do this every time I have doubts. and with time you will understand how non-dangerous git commands are.

"dangerous" part comes from the user, not the software, and it is okay. when you accidentally kick a corner of a furniture, that does not make that furniture "dangerous".


If this is the "correct" way to learn, the problems are not with the user.

A well designed ui should be learnable while being used - creating a fresh learning repo for dedicated learning for fear of losing data while using the tool in your "real" repo is the ultimate red flag in terms of usability.


> A well designed ui should be learnable while being used

Strongly disagree with that one. A UI that lets you work in the most efficient way (e.g. vi) can also be well designed, even if it's incredibly difficult to learn.

> creating a fresh learning repo for dedicated learning for fear of losing data while using the tool in your "real" repo is the ultimate red flag in terms of usability.

Again, strongly disagree. Git is a powertool. You don't let loose a powertool if you don't understand how it works. Creating a fresh learning repo is merely acknowledging that you don't know the tool well enough yet. There's nothing wrong with that.

There's a big difference between being noob-friendly and being user-friendly. Personally, I hate noob-friendly tools, because they tend to just get in the way.


> Strongly disagree with that one. A UI that lets you work in the most efficient way (e.g. vi) can also be well designed, even if it's incredibly difficult to learn.

See, I don't think you do disagree. I said a UI should be learnable while being used - vi is. I said nothing about it that learning needing to happen quickly.

I use vim every day; it's not my primary editor, but I'm incredibly comfortable with it, which I wasn't for years after I started using it. But during those years of learning, I never got stuck with an unusable dev environment, I never lost data, I never had to Google how to proceed in order to get my work done. If I ran into a problem, I had the option of opening an alternative editor and continuing with my nice non-corrupted dev env, or I could choose to spend some time figuring out the vim way (and learn something).

vi/vim have no foot guns.

Perhaps this comparison is slightly unfair to git - vi is much simpler insofar as a modal interface is a simpler concept to grok than directed acyclic graphs. So they're not directly comparable, since vi's main barrier to learning is memorising commands and git's is conceptual. However - as has been mentioned elsehwere - hg makes a fine candidate in that regard.


> If this is the "correct" way to learn, the problems are not with the user.

OP's advice was essentially to first learn how to use a tool in a sandbox before screwing up mission-critical services. I really don't understand why you perceived the need to learn how to use a tool in a sandbox as problem that lies somewhere else other than the clueless newbie who's taking his first steps.

Letting toddlers learn how to ride a bike by installing training wheels doesn't mean that bicycles are flawed or poorly designed.


a still, while people cry about UI (which is irrelevant part), other people just take time and learn git. by whatever measures.

PS. throwing out things you test on. isn't that how every programming tutorial works? write code NOT in your main repository, test things out, throw it away (or keep it, whatever)?


> UI (which is irrelevant part)

UI usability is irrelevant... ?!??

> isn't that how every programming tutorial works

Firstly, learning programming and learning to use a tool that is a component of your workflow are two independent things. The latter should generally (ideally) have a much lower (aiming towards zero) learning curve. Yes, this is possible, with good UI design.

Secondly, even programming language designers strive towards lowering this learning curve. There is, imo, a necessary complexity, or "table stakes" for any reasonably useful language, but its still very evident to any language designer that this is a trade-off. Usability is desirable.

For a tool as popular and essential as Git, usability should be a much more central goal than it seems to have been in the past.


Is writing programs and building them a part of your workflow? Do you even spend time on setting up your build? On acquiring an actual understanding of how it works?

Other tools are no different. If it's an important tool, it pays to actually study it. Do not expect tools to do what you mean before you know well what you mean.


I think you're misinterpreting my post. I completely agree, you're absolutely right, but there's a huge different between what you're saying and what the gp was: that usability should not be a consideration for these tools.

I do understand, in quite a bit of depth (which I'm sure is still far from complete), how Git works. I still prefer to use Mercurial because it doesn't require me to be continuously aware of the dangers of those internals on a day-to-day basis - it doesn't supply me with a loaded foot gun. Sadly, I use Git every day, and rarely Mercurial, because that's what the masses have chosen and I've found using in-between compat tools not to be worth the extra hassle for me.


OK, I can agree: git is not as safe, mistake-proof, or consistent as it could be.

What I like about git is that it gives a simple algebra of patches (diffs), and allows almost any sensible operation over them.

It would be great to build a different CLI on top of the same data structures (and maybe the same plumbing) that would expose a more consistent and usable interface, while allowing for [most of] the same capabilities, and preserving interoperability. I suppose git-hg tries to achieve something similar, too.

For me, much of the pain of git CLI was alleviated first by a set of aliases, and then by using magit from within emacs.


With a programming language, if you open a new file A in the same repo and try out some stuff, and it doesn't turn out well, you can just delete the file and be reasonably sure that it doesn't fuck up files B, C and D in the same directory.


No but if occasionally the furniture kicks you, then it could be improved on, pointing out flaws in the best of the best is very worthwhile, it's in that pain space that the biggest usability gains are to be had.


> when you accidentally kick a corner of a furniture, that does not make that furniture "dangerous".

When the piece of furniture has a serrated corner that cuts deep into your toe, you'll probably reevaluate the appropriateness of your metaphor.


oh, so if you are careless, that's furniture's fault. okay. your comment is not the best counter to my point either.

PS: i have a lot of furniture with rought edges. I've learned my lesson, and furniture is not a problem anymore.


> correct way of learning git is every time you want to use a "complex"/dangerous commands in git, you `cd /tmp` and create a git repository there. then create some commits, branches and experiment. you will NOT lose anything, but you will learn a lot.

What if your repo with full history is about 30gb?


Why are you learning how to use hit by testing commands on a 30gb repo?


    If you requested save in your favorite GUI application,
    text editor, etc and it popped open a select the 
    changes you would like to save dialog, you would 
    rightly think just save all my changes already, 
    dammit
I'm sympathetic to what this is asking, but I have to feel that this would lead to much better practices for many people. I'd wager a ton of folks would be more "why in the world does it think I changed that?" than they would care to acknowledge.

That is, by and large, the quick sanity check of "these are the things you did?" is actually quite valuable. Regardless of any annoyance it may give. Similar to the "did you mean to print all 500 pages?" warning you can get if you print something huge. Sometimes, yes. Yes I did. Often times, no, not what I intended.


In my experience, the benefit you get is from giving you an editor to edit the commit message, which has a list of files changed, and can be cancelled in some way to abort the commit. Both hg and git can do this, so it's not necessary to have a separate command and staging area just to do it.


List of files is not enough, I always do 'git add -p' to see the actual changes. And sometimes I detect an error, and I want to fix it and then keep going without having to re-evaluate the changes I had already seen.


The authour of this article suggests a solution - `git commit --interactive` could prompt you for each change like `git add -p` does without needing a staging error. You're right though that it would take some cleverness in that command to allow you to fix a typo without restarting the operation.


> You're right though that it would take some cleverness in that command to allow you to fix a typo without restarting the operation.

Would it though? Just finish your commit ("q" to skip the current hunk and every following one), then use commit --amend —interactive for the rest of the changes.


My preference is for `git commit --verbose`, which shows all changes in a 'commented-out' section of the commit message file.


Also you still haven't pushed the commit and could undo it.


Sounds like the OP would prefer the Gitless[1] porcelain; it removes the concept of staged files, and significantly simplifies the number of concepts required to work with Git.

(This is a fun project because there's also a paper[2] that analyzes the design of Git from a conceptual level, and comes up with the simplifications that Gitless implements.)

[1]: http://gitless.com/ [2]: https://blog.acolyer.org/2016/10/24/whats-wrong-with-git-a-c...


On source control anti-pattern I've run into is people thinking a commit is a 'save'. It's not. A commit should move you from one working state of your code to another. Compare that to saving my files which I do compulsively every 5 minutes or so regardless of the state the file is in.


I agree, but... It is nice that one can compulsively (or automatically) save, with full history. And then when things are in a nice working state, be able to distill this into one or more clean changesets.

`git rebase -i` makes this relatively easy/nice. I think even better tools for this would make it way more realistic to get away from the staging area as step in the default 'commit' process.


I agree it's not a save, but a commit doesn't have to be in a working state either, at least on a private branch. A commit is a rollback point which may not even compile.


So then why do I have the ability to stage individual lines of changes, creating commits that do not reflect a state that my system ever had, ergo was never tested or compiled in, thus can not be known to be working?

Also, what if I use a remote to synchronize between different machines I work on on the same code base?


There's three ways to deal with the git index/staging area:

- always commit everything, merge/push that (Mercurial style)

- always commit everything, then eventually do a rebase where you merge/split commits into logical units

- always git add -e then commit logical units

I do mostly the latter, but since I don't usually write code the way I want it in the end before pushing, I almost always have to go back and rebase to reorder, merge, and/or split commits prior to pushing.

Like you, I'm sympathetic to the idea of "save everything", and I emphatically agree that it's fantastic that you get to save just the changes you want because it allows one to produce minimal and logical commits.

I suspect that the author of TFA does not produce nice, clean, logical, and minimal commits. Instead I bet their upstream is full of merge commits and the merged branches are full of huge commits with lots of unrelated changes, and/or full of "fix a typo" commits that should never appear in upstream histories.

Using a VCS effectively to create clean, useful history upstream is hard, and some VCSes make it harder than others. Mercurial (and Fossil, and...) make it supremely difficult. Git makes it easy (though one does have a bit of a learning curve to get there).

Ironically, all the Merkle hash tree VCSes are git-like under the covers: there's a bag of commits, and some sort of branch name resolution. "Heavy-duty branches", like Mercurial's, are extremely confusing -- I never know how to recover from having multiple tips, and I don't even understand why that's at all possible. Whereas "light-weight" branching (like Mercurial's bookmarks, which still aren't reliable, or like git's branches, which are) is much much simpler -- even Mercurial's community gets this, though they still shy away from light-weight branches for some reason that I still don't understand. Fossil is exactly like git under the covers, except far, FAR superior to git in that it's relational -- Fossil is easily the best VCS ever in terms of implementation. But Fossil's dogma is highly opinionated and Mercurial-like: it has light-weight branches, but the UI is merge-happy like Mercurial's, and there's no rebasing (there's basic cherry-picking functionality, which of course is always the basis of rebasing, but that's it).

I really don't want a Merkle hash tree VCS to hide its tree nature from me.


A goal of producing a "nice, clean, logical, and minimal" history, as opposed to preserving history, is very opinionated, as I'm sure you're aware. I think it's not something most people should attempt voluntarily - If you want "nice, clean" waypoints, just diff between the relevant merge commits, and read their commit messages.


Yes, but git doesn't force you to do this -- you can be as messy as you like with git, knock yourself out. Git is NOT opinionated. I am opinionated. But Mercurial and others don't make it possible (or easy) to get nice, clean history, therefore they are inferior.

And no, "just diff between merges" is NOT nice and clean. It bundles lots of changes together, therefore you can't easily separate them without actually understanding all of it. The cognitive burden of reading diffs is high. The cognitive burden of reading commit subjects and messages is lower -- much much lower.

One really does have to be considerate of the people who will be maintaining one's code in the future. Even that will be you yourself, you should still be considerate to your future self.

When you work with codebases measured in Mloc or Gloc, you really need the code and its history to be accessible. This is obvious to people who have worked with such codebases. It's less obvious to people who haven't, but it it's still true.


I parted company with the OP at that point. If he considers staging a "power user" feature then we're on two different planets.


I'm surprisedly the staging-area hate. Does it really violate peoples' assumptions? I like the ability to make a big, complex change and checkpoint stable portions (subsets) of the work as I go.


Coming from someone who learned hg well before git, and who's now being more or less forced to use git long after developing comfortable hg workflows, the staging area feels like a half-baked implementation of what it's supposed to be doing.

I'm used to thinking of commits as atomic commits--roughly, each commit is the smallest change that atomically makes sense. So you should be able to use the staging area to build up that commit as you find more pieces to make it in. But while I'm slowly hacking away at this commit, I talked with someone else and decided to try an idea which turns out to be a small commit. But I've already got this half-built-up commit that's staged, so I need to commit that to make the new commit, and then somehow reverse the patches and restage the commit (which is something that's well outside my git comfort level).

What hg has is a few different features that make it possible to build up not just a single commit but an entire commit sequence and do operations on that commit sequence (like reorder them). The git staging area is kind of a weak version of that, but it ends up being in limbo: too complex for the simple just-commit-everything model but too simple for try-to-craft-the-public-changelog model.

Edit: the model I adapted to from my hg workflows is effectively commit --amend. Mercurial has an interesting way of doing history tracking such that if I really need to, I can actually follow the history of the "oops, I need to change these commit because I missed a compiler failure" or "I forgot to save this file before committing," which is a feature that git doesn't have. If you promoted amending the last commit, you wouldn't need a staging area to build up a commit, the last commit does it for you already.


Some useful commands I'd use in scenarios like this are:

  git add -p # select what to add to the staging area
  git reset -p # deselect chunks that I decided I don't want anymore
  git stash # saves your working state in a temporary commit (not in your branch)
  git stash pop # restores the working state from the last git stash command and drops the temporary commit
  git rebase -i $start_point # where $start_point is either a branch or some commit in your branch history. -i means interactive, which allows you to reorder/edit/reword/squash commits
  git checkout -b $newbranch $start_point # you can make a new branch at any commit in history, so if you decide to reorder your commits, then declare and older commit as a branch and get it merged first, that's fine. $start_point is optional, if unspecified the current HEAD commit is used.
For your specific example, with a staged partial commit on branch idea-1, I would:

  git commit -m 'WIP idea 1'
  git checkout -b idea-2 HEAD~1 # makes a new branch at the parent of the HEAD commit
  git add -p # pull in the desired changes
  git commit -m 'idea2 implementation'
  git checkout idea-1
  git add -p # stage the rest of the changes
  git commit --amend
  git rebase idea2 # if idea 1 depends on idea 2
I do wish git had some notion of sub-commits with their own messages, and a better UX for cleaning up a set of commits before merging them.


> I do wish git had some notion of sub-commits with their own messages

You could use the "fast-forward with merge commit" style. [1] Then you can treat the commits from the branch as sub-commits, and the merge commit as the parent commit.

[1]: https://stackoverflow.com/questions/15631890/how-to-achieve-...


Interesting! That's almost what I want, except here's the specific behavior I want to achieve:

  A clean git log view that shows only meta-commits
  Git blame shows both the meta-commit and the sub-commit for each line
I suppose I could get creative by enforcing that tags be embedded in commit messages and then filtering them in the log view, but it would be better if it was standard.


You can show only merge commits with

    git log --merges
However, I don't know if there's any way to achieve that with git blame, and not all tools may offer that kind of log view.


> I do wish git had some notion of sub-commits with their own messages, and a better UX for cleaning up a set of commits before merging them.

Does rebase -i handle this adequately or am I misunderstanding what you're describing here? My workflow sounds like exactly what you're saying here, where "subcommits" are my development commits and the UX is the rebase -i editor.


See my reply to mmebane about sub-commits. Re: UX, specifically the patch editor (git add/reset -p) is sometimes difficult to use. For example, I've had issues with changes that happen around empty lines refusing to apply for unknown reasons.


With git, you can stage line by line, file by file. You don’t need to commit everything at once. Also, you can add the lines you want in to staging, then use the stash feature to temporarily save lines that you don’t want in the commit. This allow you to test the commit without all of the extra lines you don’t want.

I would highly suggest reading Pro Git [0], a well written book on not only how to use git, but also the best practices, common workflows, and some of the internals in to how git works. Many refer to it as the essential book on git.

[0]: https://git-scm.com/book/en/v2


I don't know mercurial, and I agree about the limitation you mention for stashing the index. That said if I need to do a refactoring or something else while I have an unclean tree, I stash before doing it, then commit it, and then unstash my work.

> What hg has is a few different features that make it possible to build up not just a single commit but an entire commit sequence and do operations on that commit sequence (like reorder them).

Again, I've never used hg (beyond a simple tutorial), but you can commit frequently with git (every few other changes "WIP") and reorder / fuse / edit later through interactive rebase.


I think people generally get side-tracked by focusing too much on the staging area. It's a red herring in my opinion. I barely notice that it's there, and I do the crafting-the-public-changelog thing on an almost daily basis.

I recommend learning about the combination of `git gui` (including its amend option), `git rebase -i`, and `git commit --squash=/--fixup=` (and don't forget to set `git config --global rebase.autosquash true`).

The workflow in Git could still be improved: `git gui` should really allow you to create "squash!" commits more comfortably, for example (I often end up staging with `git gui` and committing on the command-line).

Also, it would be awesome to have an editor in which you could make edits to selected commits directly in a rebase-like fashion. But I don't think that exists for any version control system out there, and getting the UI right for dealing with conflicts would be quite challenging.


Resorting to rebase is a pretty heavy price to pay though.


From my experience, there's no such a heavy price as long as you don't rebase what you already shared with others.


How so? It's very fast and you can abort if you do it wrong.


Well, the obvious and immediately painful one is the hard-to-recover failure mode when you discover something you rebased wasn't private after all. (Which can happen in many situations, eg github merge button, even without other people involved).

The other is that it destroys history, you change the record of what actually happened to the version that is a plausible simplification in the opinion of git's diff/merge heuristic. You then can't then go back for to look for explanations of bugs or test code changes, which change happened first or whether a fact claimed by a commit message was really true at the time, or where is some change a merge mess-up or considered change.

Rebase also really complicates the mental model of git you have to work with.

I appreciate though, that there are cases where you want to withhold the record of what actually happened and use squash/rebase, it's a type of privacy from others. But simply grouping together commits into one is not a good reason to do it, that's what merge commits are for after all.


> Well, the obvious and immediately painful one is the hard-to-recover failure mode when you discover something you rebased wasn't private after all.

How is this hard to recover? You have full control of both copies, the system protects you against data loss, and you can easily pick one of the two, rebase one against the other, etc. to recover.

> Which can happen in many situations, eg github merge button, even without other people involved

This is the real problem and it's DVCS agnostic. If you make a bunch of changes to the same code without staying current, a human is going to have to reconcile it. If you follow recommended practice and update your local changes against the shared upstream regularly, this is a far more manageable problem — and that's true for every version control system in existence.

> The other is that it destroys history, you change the record of what actually happened to the version that is a plausible simplification in the opinion of git's diff/merge heuristic.

More correctly, you change it to the version as presented by the human who made the decision to rebase. If someone chooses to remove important context you can have problems but that's the same category of social problem you'd have with someone who uses poor commit messages, makes commits which are incomplete, etc.

> Rebase also really complicates the mental model of git you have to work with.

I find the opposite to be true. Most of the people I've taught seem to quickly grasp the idea that a rebase is simply taking your set of changes and moving them to apply against the current shared consensus rather than that state when you started, whereas merges cause regular confusion during code review or conflicts when people are asked to reason about changes someone else made.


Stashes should cover your first scenario; put those changes (both staged and unstaged, optionally also untracked files) in a bag and re-apply them when you are back from your short expedition: https://git-scm.com/docs/git-stash

To clean up outgoing changes, you can run an interactive rebase of the current branch onto its remote tracking counterpart. This will list all affected commits in a text editor, and allows you to reorder, squash, change the commit message or the content, much like how you describe it works in hg: https://git-scm.com/docs/git-rebase#_interactive_mode


Problem with stashing is that you then need to redo the work of staging afterward.


You only need to git stash pop and the staging area is back as it where, there is no need to redo the work of staging.


you can --keep-index to avoid stashing it, and then make an additional stash. If you have modifications within the same file it'll complain about conflicts though, so on the second sash you'll have to `git stash show -p | git apply -R` and then drop the stash, which is super clunky.


Stashing saves/restores the staging area so you shouldn't need to redo anything.


> Edit: the model I adapted to from my hg workflows is effectively commit --amend. Mercurial has an interesting way of doing history tracking such that if I really need to, I can actually follow the history of the "oops, I need to change these commit because I missed a compiler failure" or "I forgot to save this file before committing," which is a feature that git doesn't have. If you promoted amending the last commit, you wouldn't need a staging area to build up a commit, the last commit does it for you already.

I'm confused about what you think --amend is lacking. If you want a full history including amendments, why not make real commits? In case you need it git can track the original edit-showing history of something too, it's just awkward. Do you think more people would use it if it was easy?

Also if you really want to uncommit and restage it's a simple command to write down. (reset --soft HEAD^)


The staging area is trivial to use effectively.

Just:

- edit files - git add -e - optionally git diff --staged (to review what will go in a commit) - optionally lather, rinse, repeat as needed - git commit

Alternatively:

- edit files - git add - git commit - lather, rinse, repeat - git rebase -i origin/master (or whatever) and reorder/squash/edit/split commits as needed

For the latter's last step, sometimes the easiest thing to do is to squash all commits when you're done, then git reset HEAD^, then apply the git add -e && git commit loop in order to break up your work into logical commits.

When done, push.


> But while I'm slowly hacking away at this commit, I talked with someone else and decided to try an idea which turns out to be a small commit. But I've already got this half-built-up commit that's staged, so I need to commit that to make the new commit, and then somehow reverse the patches and restage the commit (which is something that's well outside my git comfort level).

Git stash

> What hg has is a few different features that make it possible to build up not just a single commit but an entire commit sequence and do operations on that commit sequence (like reorder them). The git staging area is kind of a weak version of that...

The git way is to make a new branch and then use git rebase -i


You should look up the stash command.


Partial commits through piecemeal staging is great as a poweruser's tool, like rebase. But it's beyond weird that anyone thought it was a good idea to make people deal with staging for every commit.

It doesn't even have to go away. (And on that note, I'm bummed that the conclusion in the article is to do that, especially because everytime this comes up, there's an uproar.) But, jeez. Just tuck it away so futzing with it is an opt-in decision for the 5% of the time that you need to make sure you aren't committing all of your changes. (And on note #2, if a person is doing partially staged changes more often than that, my question is why? What kind of chaotic, stream-of-consciousness style of development leads to that, and is it even producing good results? It doesn't take especially careful planning to not end up in the scenarios where partially committed changes is the answer.)


Often I'll work on a large change and make sure that it is working before I start crafting my commits. By the time I get to that point, I realize that I've actually made three separate subchanges so I do partial stages to create each commit.

I use git status -s and git diff quite a bit as a sanity check (for things like whitespace) and to do first party code review before I send anything out.

The staging area is a core part of my personal git workflow pre-commit. It seems to be producing good results so far.


> I realize that I've actually made three separate subchanges so I do partial stages to create each commit.

You can trivially replicate that without staging by amending the HEAD commit, or building a bunch of partial commits then folding them with rebase -i. You can even do that automatically by committing with --fixup or --squash


That assumes that I'm making multiple smaller commits along the way. I'm not. Without staging, I'm stuck making one big commit. Splitting a single big commit into multiple smaller commits with an interactive rebase is dependent upon the staging area too.


> That assumes that I'm making multiple smaller commits along the way. I'm not. Without staging, I'm stuck making one big commit.

Errr… no you're not. `commit -p` exists.


I have ADHD, so often as I am working on one feature, I will begin accidentally working on another unrelated feature along side it. I rely on staging a lot to ensure that each commit has only the parts that it needs.


Same here, I actually find `git add -p` to be very useful.


commit also has a -p flag.


Oh wow. No need for me to use git add ever again!

EDIT: I suppose I have to use it to add new files :)


Just use `git add -i` to mix and match :)


For my first couple _years_ with Git, I just passed the -a flag in upon every commit. This is a pretty easy, low effort habit to form, so this reduces to a question of which default to set. Git's approach seems to be that the default should be to nudge users toward power usage, which I don't think is necessarily a bad thing. (Note that git's approach to defaults in other areas, like potentially destructive actions, is actually a problem).


If you consider the use case "linux kernel development", you realize that you really don't want to automatically commit all your changes.


...and why is kernel development any different to any other large scale project?


I do consider the LKML-centric use case. (Often—probably more often than most.) No such realization is setting in. How about you elaborate?


Git doesn't actually make people deal with staging for every commit, though.

It's quite common to tell newcomers to always `git commit -a`, and `git add` for adding new files. That's pretty much exactly where SVN is/was, and nobody complained about that being complicated or onerous either.


You probably want "git add -N"; otherwise "git diff" won't work as expected.


As the author points out, it is an unnecessary primitive, once you learn about things like commit —amend, which you will inevitably eventually learn about anyway. Adding redundant ways to do the same thing is not the way to make a tool approachable.

The basic problem is that git was designed to solve Linus’s problems, and Linus’s problems are not your problems. However, because of network effects, most of us end up using the same tool.


How does commit --amend in any way solve the same problem as the staging area?

If I’ve got a big change which I want to split into three small internally consistent changes, with each change overlapping in many files, how in the world would --amend help?


It solves the problem I was directly replying to:

> I like the ability to make a big, complex change and checkpoint stable portions (subsets) of the work as I go.

However, it does not replace all use cases of the staging area, which is why I gave it only as an example.

Of course, `commit --amend` with `-p` could solve your problem as well.

Alternatively, “create a new, 'pending’ commit, without a commit message, don’t point HEAD to it yet, and change most of the commit-related commands to modify this commit until it is finalized” also solves all the use cases of the staging area, because that is what the staging area is, but it is presented in a much more confusing way and ends up being yet another new concept for git beginners to grasp.


Fair points.

After thinking about this more, I find myself agreeing that in terms of beginner friendliness, the combination of incredible name confusion and the relative paucity of use cases where it’s necessary make the staging area an interesting area for significant change in the git defaults.


90% of VCS commits are simple enough to not need staging in my experience. That you can stage your changes does not mean that you need to be forced to do that. This is the point the article tries to make: streamline the workflow, but do not remove the feature for power users (people like you).


I believe there's a flag to `commit` that effectively does `add .` first.

I disagree with your statistic though - I can't tell you what the flag is precisely because it's almost never what I want; so I never use it and haven't learnt it.

I think these things just vary with workflow.


Just for completeness sake, the command to do that would be

  git commit -a
It does an implicit git add -u (which does not add new files not yet part of the repo)


I record my uses of each command and the current counts are the following (g is my shell alias for git):

    161 g cap      # commit --patch --amend
    170 g adu      # add -u
    205 g add -p   # add --patch
    277 g ci       # commit -a
    484 g cip      # commit --patch
At least, I use it a lot. ;)


So you disagree based on your personal experience? That is fine. I base my observation on what the people around me were using in various environments, so I am actually fairly confident that this can be generalized. This is the best I can do unless someone makes some representative statistics over a broad user base.


But why not use a mechanism that already exists - local commits. That way you also get a complete history of changes (including rollbacks), can write incremental commit messages, and can squash everything partially or completely at the end.


I made myself a few alias to use this approach. After a few months of use I can say staging is practically unnecessary for me. There were a few edge cases, but maybe I was just falling back to known solutions. I'm in favor of dropping staging.


How do you test the thing you're actually committing without everything else in your working tree contaminating your test?

There are ways to do this using stash but if you stash before you commit, you get merge conflicts when you pop the stash. The workaround I've seen is to commit before you stash, but that means you're also committing before you've actually tested that change, which seems pretty broken, IMHO.


I've created a script for this, it applies the staged changes to a fresh clone and executes the command you specify. To quote from --help:

If you have lots of unrelated uncommitted changes in the current repository and want to split up the commit, how can you easily check if the changes passes the test suite? With all the other unrelated changes it can be hard to make sure that only relevant changes becomes part of the commit, and that they don't result in regressions. This script clones the repository to the directory ".testadd.tmp" in the current directory and applies the staged changes there (unless -u/--unmodified or -p/--pristine is specified), chdirs to the same relative directory in the clone and executes the command specified on the command line there.

It's available from

https://github.com/sunny256/utils/blob/master/git-testadd

and the test suite (place it in a subdirectory below the script) is at

https://github.com/sunny256/utils/blob/master/tests/git-test...

It's stable, and I've been using it almost every day for 1.5 years. It also works with binary files.


Commits are just a bit of local data. Until you push it out somewhere public, it's all just a big mushy pallette to do with as you will.


Create a new branch for each issue where you create broken untested garbage commits with meaningless messages like "Fix".

When you're done:

git checkout master

git merge --squash <branch_name>

git commit -m "<Real commit message here>"


My problem with the staging area is it produces commits which never "exist" (that is, that code was probably never compiled/tested in isolation).

I'd be happier if the staging area was handled at the filesystem level, give me a directory which at all times contains the staging area, so I can test it.


Your CI system can test those commits in isolation. I frequently push supposed-to-be-stable commits like that alone to my working branch on Github, so Travis CI can tell me if I have a problem. Though I'm quite picky about commits, that each merge point into mainline (master) is tested is what is most important to me. Github PRs with Travis CI for instance enforce that pretty OK. If I don't bother to do a PR, and the changes as a linear set upon master, I like to use git merge --no-ff master to force a merge commit.


To me staging is the best part of git. I don't want to commit all the time so I slowly stage changes that I think are OK. Once the feature is done I can do a full commit. Perfect!


The author purports that this can be achieved without staging. For example, you could slowly commit small changes that you think are ok on a work-in-progress feature branch with `git commit -p --allow-empty-message`. When you are done you could squash rebase + message rewrite.

While currently this workflow is clunky, it could be made to work with the same nice semantics of your workflow, and not have a staging area at all.


Sure. I can get the same result I can get with simple staging with a complicated squash workflow. But why would I do that?

I have seen people get in trouble with rebasing and generally while manipulating history but never with the staging area.


> I have seen people get in trouble with rebasing and generally while manipulating history but never with the staging area.

I think that's what mercurial tries to solve, it makes manipulating history safe (by tracking the state of the commits, whether they were published or not for example, and always keeping all the history but hiding it after it was rewritten).


I should add. I always use a GUI like SourceTree for staging. With the command line it's much less fun.


The thing is that you don't need a staging area for that. The staging area is an unnecessarily complicated way to address this particular problem.

More importantly, the article is about the staging area being there by default.


This, exactly. What's so hard about one more command? If you really want a "save all" , just `git commit -a`.


The Git staging area doesn't have to be this complicated. A re-branding away from index to staging area would go a long way. Adding an alias from git diff --staged to git diff --cached and removing references to the cache from common user commands would make a lot of sense and reduce end-user confusion.

--staged is already an alias for --cached in the latest Git version.


And it has been for some time (years, I think)


Years, yes. At least since 2.0, though probably longer.


It's clear that there will be a successor to Git some day, in the sense that Git is a successor to SVN (yes, I know Linus's viewpoints on SVN). But the successor won't be a "better Git" just like Git isn't a "better SVN".

The driving features of Git's successor will be unrelated to Git UI gripes. If Git's UI gripes were important enough, people would just be using Mercurial (which has it's own quirks). The biggest problems I've personally seen with Git are the same problems you see with other VCS, where two people change the same file, and one commits the changes first. Now you have to merge the changes and sometimes it's a pain in the ass when you just want to submit your changes and go home. A better VCS isn't going to solve that!

I'm still glad that Git delivered me from all the merging problems that SVN had, although I've heard that SVN is a little better these days.


Personally, the problems I have with git are related to the fact that none of my coworkers use git. If I mess up my repo I don't have anyone to look for. The experience with mercurial is exactly the opposite. I never needed to look for "experts" to solve a problem in mercurial. hg help is all I needed. Consistent UX and safe and powerful VCS is all we need.


> I never needed to look for "experts" to solve a problem in mercurial. hg help is all I needed.

Git has good documentation too. As long as we're doing personal experience, here's mine: I learned git from `git help` too (well, `git --help` and `git <command> --help`), and I never needed to look for a git expert in my team for advanced operations; I did have to be one for others quite a few times , but when it wasn't an immediate, pressing concern I could just point to the appropriate manual.


Does anyone who's downvoting this comment care to explain why?


I did not downvote the parent comment but I find

> Git has good documentation too.

hilarious. Git has a sprawling mess of disorganised man pages. I find calling that "good documentation" insulting to people who care about documenting things.



>Git has a sprawling mess of disorganised man pages.

This is extremely unfounded. You're basically just making shit up with baseless platitudes.


Because git dies not have good documentation.

Perhaps that's unfair to the documentation in that it does a good job of detailing commands and their options, but is urinating into a hurricane as far as trying to convey any notion of conceptual integrity for a piece of software that has none.


What about https://git-scm.com/doc ?

I wish all the software I use had a documentation half as good as this one.


I guess low expectations are still expectations?


> A better VCS isn't going to solve that!

It might!

I feel like a "language aware" VCS would be able to help a lot in this area.

I know there are 3rd party tools that can do this, but if a VCS came along that natively knew about some languages it could unlock some really cool features.

Imagine instead of your VCS storing the text source, it stores the AST! Language aware means it can also integrate tightly with language package managers.


That's not really a "better VCS" that's just a better 3-way merge. You can already plug your own 3-way merge into Git and other VCS.


Can you recommend any language aware 3-way merge tools?


Semantic Merge comes to mind, although it only does an AST merge on the structural level while blocks are still merged traditionally (I guess it's a good tradeoff of complexity, though). It has however proven to be rather slow with large merges (or large files; the source file I tried to merge had ~4 kLOC), in my recent test (was merging a major version of one product into derived ones, with lots of refactoring, so I thought SM could help).


but if a VCS came along that natively knew about some languages it could unlock some really cool features

Absolutely, although hopefully it'll be implemented using langserver or similar.

Conversely, I think programming languages (and code- and configuration-generators) of the future will be judged in part on how VCS-friendly they are. A few smart decisions when designing syntax could make diffs smaller and conflicts easier to resolve. (This kind of change would, I am convinced, also make experimental changes to code easier--think rewriting SQL queries to make them faster.)

For now, I've been adapting my own coding style over time to be more VCS friendly. It's made my life easier and has indeed made making experimental changes easier to boot.


This discussion won't see much more now that it's off the front page, but is there any work towards full undo support integrating with version control or even at the IDE level?

There's a lot of stuff that doesn't get checked in; I wouldn't necessarily want it all in the official history but it would be amazing to have a time-searchable view of the project's entire development story for my own personal review/rescue operations.



https://www.semanticmerge.com supports .NET languages, C/C++, and Java (with support for JavaScript planned). I would be interested to here any anecdotes from users of this product!

While struggling to search for this tool, I discovered https://www.gmaster.io: a beta Git client (for Windows) from the same company extending these ideas closer to the VCS level.

Codice Software's main product appears to be PlasticSCM; they've done a bit of marketing here on HN without catching much interest.


> I feel like a "language aware" VCS would be able to help a lot in this area.

VCS would have to be much more than "language aware" to even begin to help in this area.


Yes, one would need an AGI for that.


That successor ought to be Fossil, by all rights. What keeps me from using Fossil is its opinionated UI.

Lots of people hate git's UI, but it let's you do all sorts of workflows. You can rebase a lot, or merge a lot, or both. With Fossil you're pretty much stuck with a merge workflow.

What makes Fossil superior to all other VCSes is that internally it is a Merkle hash tree organized relationally, with an internal SQL interface. This makes a) development of tools much easier than for any other VCS, b) much easier to write new kinds of queries and make them perform well.

Perhaps some other VCS will come along that will do what Fossil does, but with a less opinionated UI.

I understand why Fossil is opinionated: it's not meant to be a VCS to take over the world, but instead just a VCS for SQLite3 and related projects. As such its developers don't really care for any functionality that they themselves don't care for, and thus my need for rebasing, and a staging area / index, is as nothing to them. I can't blame them, really, but it would be truly fantastic if we could support rebase workflows with Fossil.

Another thing I don't like about Fossil is its approach to repo sync. I like that with git I can push/fetch individual branches/tags, some of them, or all of them. With Fossil you don't get much choice, as it's all public branches or nothing.

Power users (and I'm not saying here that Fossil's aren't) need more UI power. To appeal more widely, a VCS as to support more power users.


As the main developer on the project and the resident guru, I ask everyone working on this project to create a branch for their work and just drop me a message - a pull request. Which is a feature incompatible with some git workflows, like the linux kernel's, but finds wide adoption on git productivity tools like gitlab and github.


I mean, you could just checkout a new branch and push that, right?


Then it won't make it into production.


- "submit your changes and go home"

- "production"

The merge should happen far before it hits production.

Maybe I'm taking you too literally or something.

Do you have a CI/CD pipeline?


Workflow for the team I’m on right now:

* Make changes

* Run tests

* Code review

* Submit changes

CI pipeline will run larger integration tests, make releases, and deploy to production. We can cherry pick commits to fix bugs at this point. The pipeline will roll out releases to the development environment on a nightly schedule and to production on a weekly using canaries.

You can’t submit if your changes don’t merge cleanly. You’ll need to fix the changes and re-run tests (can be slow) at the minimum.

If you don’t submit by end of Wednesday you have to wait another week.


Submit means "pushing to master" or "pushing feature branch"?


Pushing to master. We don't really use feature branches here. New features are generally hidden behind flags, eventually enabled by default, and then the competing / legacy code paths are deleted.


you could work in branches and merge after feature freeze, or you could, you know, colaborate and communicate intent before beginning to edit.


This is a really interesting piece that introduced a number of ideas I hadn't ever thought deeply about, particularly in the section on nameless workflow. (I do think of myself as reasonably capable with both Mercurial and Git.)

If your instinctive response is to defend Git, boost Mercurial, ridicule people who can't use their tools, ridicule the tools for being unusable, or whatever -- suppress it for a while and give this article some thoughtful attention.


My instinctive response is to explain why git is easier. TFA is very unfortunate. I think the author is missing out on a better experience, and their git hate isn't helping them.


Let me suggest to suppress that instinct and read the article again. There's not a shred of "git hate". Quite the contrary: The author is a Git user himself and has detailed suggestions to improve Git and its ecosystem. That is a good thing, not "very unfortunate". Also, Git is easier than what exactly? Than the proposed improvements? I suspect you haven't read the article. This is neither a "Mercurial vs Git" nor a "Git sucks" piece, which you should have noticed while reading.


Lots of food for thought in this article, but the part I find most interesting is his distinction between "soft" and "hard" forks. (Viz., are you "forking" in order to collaborate or in order to go your own way?)

If collaborate, it would be nice to lower the barrier to participation ... more like Wikipedia. I know that I often don't bother to submit a PR for small changes bc of the overhead of setup. Whereas I fairly often make small edits to Wikipedia changes, because it's so easy.


This is one of the few things that Launchpad got right. jquery, for example, would be at launchpad.net/jquery, and your own work would hang off of that. GitHub went with inverted namespaces (user/repo) instead, where every personal fork is an island unto itself, and the world is worse off for it.

I had hope that GitLab would take the opportunity to fix most of the problems GitHub introduced (or at least a few), but they seem to be interested in just copying the mistakes and chasing the market segment made up of people who's only criteria begins and ends with like GitHub, but on-prem.


GitHub at least has a simple way of making small changes: You can simply click 'Edit' on any file, which will make a fork, commit and pull request in one operation. It's only good for single-file changes you're confident to make in a simple web text editor, though. I've only used it for typos.


I don't like that doing so frequently litters your personal repository list with forks. Would be nicer if they allowed arbitrary users to push to branches like `incoming/<username>/<branchname>` in the original repo.


Isn't that up to the repository maintainer whether they allow pushes from other people? And sure, it litters your list with forks, but you can remove them as soon as the PR is merged. And that's the case for pretty much anything you contribute to (unless you can push there, of course).


No, there is no process in Github to allow everyone to push to a certain branch (or branches matching some regex). Github forces people without collaborator rights to use forks.

I've seen this lead to confusion where people fork a repo and send PRs even though they have contributor rights, because they didn't understand the difference. As an extreme example: https://github.com/flystack/misty/pull/101 - This is a repo with only one person with write access [1]. Yet this same person only commits to their private fork of this repo and sends himself PRs that he then immediately merges. I don't mean to blame the developer, only the Github UX.

[1] It says "6 contributors" on the mainpage of the repo, but four of these are from my team and we definitely don't have write access.


> Isn't that up to the repository maintainer whether they allow pushes from other people?

That's an orthogonal concern. What they're saying is what the essay suggests at the end, Github could maintain a list of "contributor refs" in the source repository in the same way they expose PR refs (refs/pull/*/head).

This could be managed entirely by the service provider (e.g. github) and invisible to the repository maintainer until a PR is created.


I don't know why Github does not make this easier. What I would like to do is:

    git clone https://github.com/mozilla/DeepSpeech
    [... edit & commit ...]
    git push # Github automatically creates and pushes to a branch called qznc/master
    [ one more click to open a pull request ]
All this assuming I have no write access to the repo and without creating a repo at https://github.com/qznc/DeepSpeech.


Absolutely. I've never understood the reason for requiring you to make a fork, just add my changes to the PR (in an anonymous branch for example) and be done with it! I don't want to 'fork' your repository to propose a spelling mistake fix.


Anecdotally, being able to commit anywhere, without needing a branch name, is one of the biggest conceptual differences when we teach new hires Mercurial at Facebook.


As a convinced Git user, this is the one aspect of the article that I found genuinely interesting. I mean, you can commit anywhere in Git, too (and I occasionally do for some advanced use cases), but it tends to not be well supported by Git.

(And, I might add, probably not by Hg either if it requires enabling an extension...)

Anyway, hg show work looks like an interesting thing. It's perhaps a bit more magic than Git usually goes for, since you can't just show all DAG sinks (it would get messy really quickly with amends and rebases). And I don't think I'd benefit too much from it, since I've come to prefer having all my work-in-progress on a single branch. But I can see how someone could appreciate the feature.


> As a convinced Git user, this is the one aspect of the article that I found genuinely interesting. I mean, you can commit anywhere in Git, too (and I occasionally do for some advanced use cases), but it tends to not be well supported by Git.

"Not well supported" in this context means that you can lose data (as commits that are not reachable from a ref can be garbage collected). Mind you, you have to ignore warnings/errors to get there, but without ignoring them, you also won't be able to actually have anonymous branches.

> (And, I might add, probably not by Hg either if it requires enabling an extension...)

It does not require an extension. The `show` extension that Greg talks about is a smart log display, which shows a contextual log for your current work. Anonymous branches have been part of Mercurial from the beginning, long before the `show` extension existed.

For what it's worth, it's not so much that Mercurial supports it, but that Git doesn't. Git is rather unique in its setup in this regard. No other VCS that I know of uses garbage collection and in particular branches to prevent revisions from being garbage collected. (Some may require you to name branches for other reasons, but not so you don't lose commits.)

It's probably also worth noting that we have an implementation detail leaking into user space here. Git uses what is essentially purely functional data structures underneath to achieve atomicity in the absence of an actual database engine [1]; a different implementation, such as on top of SQLite, could avoid this.

[1] Atomicity here is not about locking, but about a transaction being interrupted by the user or an external event, such as a shutdown.


It is supported, it's just that the only thing keeping commits not on a branch alive is the reflog.

Question: How does Mercurial deal with garbage collection? After all, the desire for garbage collection is by far not unique to Git -- any version control system that has the equivalent of `commit --amend` and rebase should provide it.

As for not needing the extension, it seems to me that having "dangling" commits would be very difficult to use without some decent visualization of the dangling commits, such as what the blog post shows with `hg show`.

> It's probably also worth noting that we have an implementation detail leaking into user space here.

I find this comment absolutely fascinating, because truly, this is not an implementation detail leak at all.

The point of the purely functional data structures isn't to achieve atomicity (although potentially being a bit more robust to power loss etc. is certainly a nice side effect), it's a way of thinking about version control. I've never heard functional programmers use atomicity as the main argument for immutable data structures, either...

The whole point of Git's design is that it chose a robust and crystal clear way of thinking about distributed versioning as its underlying model of what version control is, and then simply provided tools for manipulating that DAG. Some of those are a bit ugly because of how the system grew over time, but still, this is how software should be designed: have a clear model of what the data is, then provide tools for manipulating it.

For what it's worth, the underlying data store of Git could quite easily support "unnamed" tip commits as well. What you'd need is a change in the garbage collection policy, and "porcelain". Which brings me back to the question of how Mercurial does it.


> Question: How does Mercurial deal with garbage collection? After all, the desire for garbage collection is by far not unique to Git -- any version control system that has the equivalent of `commit --amend` and rebase should provide it.

Garbage collecion is an issue that is 100% unique to Git. No other VCS even thinks about throwing user data in the repository away without the user explicitly telling it to. Once you have a user telling you to throw the data away, it can do that. There is no need for GC; this is purely an artifact of Git's implementation. I'm honestly not sure why you think you'd even need a GC for `hg commit --amend` or `hg rebase` (or similar operations in other VCSes).

> As for not needing the extension, it seems to me that having "dangling" commits would be very difficult to use without some decent visualization of the dangling commits, such as what the blog post shows with `hg show`.

I'm not sure where you get the idea. This feature is, after all, not unique to Mercurial. It's Git that has the oddball semantics that no other VCS on earth has. Mercurial has been able to graphically show the graph for ages and the ability to just list open heads, too (`hg heads`). If you look at the code, the implementation of the show command is largely just a templated graphlog of a particular revset. For example, `hg wip` [1] (for "work in progress") has been doing something similar just using revsets and templates from core Mercurial.

> The point of the purely functional data structures isn't to achieve atomicity (although potentially being a bit more robust to power loss etc. is certainly a nice side effect), it's a way of thinking about version control. I've never heard functional programmers use atomicity as the main argument for immutable data structures, either...

This is because functional languages do not have to worry about their state being destroyed by the user hitting Control-C or a power outage. This will simply terminate the program, whereas for Git it will interrupt a transaction in progress.

> The whole point of Git's design is that it chose a robust and crystal clear way of thinking about distributed versioning as its underlying model of what version control is, and then simply provided tools for manipulating that DAG.

This is what other version control systems do, too, without relying on purely functional data structures. The fact that the data structures are purely functional is, after all, not a property that is visible to the user other than through the side effects of garbage collection.

[1] http://jordi.inversethought.com/blog/customising-mercurial-l...


> I'm honestly not sure why you think you'd even need a GC for `hg commit --amend` or `hg rebase` (or similar operations in other VCSes).

Maybe you don't call it GC, but doing a rebase in Git leaves the old, pre-rebase version around. That is a feature: over the years, it has happened to me more than once that I'd missed something when resolving complex conflicts during the rebase. Being able to refer back to the state from before the rebase was very helpful in these cases.

If Mercurial throws the old version away unconditionally, that would suck very much indeed.

If Mercurial keeps the old version, then perhaps at some point in the future I'd really rather have all that old data removed as a simple matter of saving disk space. Surely nobody wants to do that manually?

Hence: you either have a system that makes it much easier than Git to lose data, or you need garbage collection.

I don't know what Mercurial does, but somehow, the fact that this dilemma isn't obvious to you -- somebody who clearly seems to know a lot about Mercurial -- doesn't instill a lot of confidence in it.


> If Mercurial throws the old version away unconditionally, that would suck very much indeed.

Which is why it isn't done.

In core Mercurial, the old revisions are stored in a backup bundle in a separate backup directory. Note that bundles can transparently be used as read-only repositories, so you can view their logs as though they were still part of the parent repo, diff against them, pull from them, etc.

With the evolve extension, those revisions will simply be marked as obsolete, with obsolescence markers showing which revisions were replaced by which. The commits will be hidden, but are still part of the repository. If you ever want to get rid of the old revisions, you'd have to use (say) `hg strip -r 'exctinct()'`, which would store them as bundles as described above, or clone the repository and delete the old repository.

Plus, there are public, draft, and secret changesets. Public changesets are immutable and cannot be changed without user override.

Bazaar rebase will simply hide the old revisions; you can recover them with `bzr heads --all`. To permanently delete the revisions, you have to clone the repository and delete the old version (and all backups). And, of course, there's rarely a reason to use rebase in Bazaar.

> Hence: you either have a system that makes it much easier than Git to lose data, or you need garbage collection.

As I described above, neither. In every case, you need to go through several steps each requiring the user to affirmatively express their desire to delete data.

And as disk space is really cheap these days, hardly anyone ever actually deletes the data in practice, as there's no point to it.

> I don't know what Mercurial does, but somehow, the fact that this dilemma isn't obvious to you -- somebody who clearly seems to know a lot about Mercurial -- doesn't instill a lot of confidence in it.

I think your dilemma is largely an imaginary one, fretting over a resource (disk space) that is too plentiful to require micromanagement.

Keep in mind that most of the data in your repository will come from other people; there's only so much source code or text that a single person can write in a day. If you're generating massively large binary assets, a DVCS is probably the wrong tool, anyway, because of scaling concerns. This inherently limits the amount of "wasted" data that you can have in a repository to a percentage of the repository size.


> I think your dilemma is largely an imaginary one, fretting over a resource (disk space) that is too plentiful to require micromanagement.

I think the need for GC in git is likely also tied to its original resource intensive implementation. Pack files were added to fix that, but then you need to GC the blobs to prevent storage blowup.

Mercurial and most other VCS have delta storage as their base format which avoids this issue.


Thanks for the explanation.


The thing I love of git is that its internal structure is very simple and transparent, so for any given repository it is possible (although not necessarily easy, if things are well messed-up) to understand what is going on. It is true that the interface is often messy and inconsistent, sometimes annoyingly so, but if I can understand what things are, I can somehow work out how to do things with them (in extremal situations, I know I can fast-export everything, process with custom Python scripts, rather easy to write, and fast-import again). While if I do not understand what things are, the most powerful and user-friendly tool will have very little value to me. Also, knowing what is git under the hood makes me very confident that I will not lose pieces: as soon as the hash of my interesting commits is in another repository (which is not corrupted), I can do whatever I want with the first one and everything I care will be safe anyway.

All of this depends on me knowing what happens inside git and the git insides being rather simple (a hash tree with a handful of object types); I think I would like Mercurial (or any other DVCS) much more if I knew the same about it: can someone suggest places where this is described?


As a long time mercurial user being forced to switch to git I find that I know more about git internals than mercurial. Mercurial provides a nice abstraction so I don't have to care. In face I've seen a few hints that mercurial has actually changed their internals over the years, but since everything still "just works" I don't think about it. By contrast git is forcing me to care even though I have better things to do with my time.

Sadly github and CI systems provide community tools around git not mercurial (mercurial is an afterthought at best in most CI systems). Those community tools are compelling enough that I'm switching to an inferior system just to get them.


I find the Git UI discussion the least interesting part of Git discussions. Git's UI isn't great, but it usually doesn't matter because 90% of what you're doing is what you always did with Subversion and Hg. The commands are a little more obtuse but that doesn't matter either.

I really like Gregory's discussion about nameless workflows. For small side projects I use Git on a personal server and I don't care to name every change or even use branches. I essentially just want my work sync'd remotely and maybe -- maybe -- I'll look at the history, but almost never. I used to use tar/scp for this before I got worried about overwriting local/remote changes.

I use Git for this now and guess what, my commits are all '.'. I blatantly don't care, and some commits I'm rewriting the entire source tree in a day (like I said, small side projects) so the message would be "Old version sucked, rewrote" and there would be 20 of them.

Mostly this is because I don't care to waste my brain power on managing my changes. That's what I'm using a VCS for. They don't need summaries -- that's what diffs are for -- and they certainly don't need names.

Gregory's argument against forks makes sense to me too. Semantically a fork doesn't really sync back up with the main branch; if it does it's pretty rare -- or just look at the terminology here: branch, root, trunk... FORK. Fork doesn't really go there.


One of the main problems I still face is: Git's CLI is incomprehensible. It's a tangled mess of poorly named commands with even more poorly named options.

If there's a "create" command, it's "modify" counterpart is likely to be "create -b -u -t --modify", and it's "delete" counterpart is likely to not exist at all or be an arcane incantation of several non-related commands and a push.

Unfortunately, most GUI tools are just thin wrappers on top of all this mess (with the possible exception of GitUp whose abstractions are also paper-thing at times)


I find this depressing.

I do think the git model is superior to Mercurial's, by far.

I also find the git commands much, much easier to use than Mercurial's, unless I stop caring about having meaningful commits.

With Mercurial I end up just committing lots of work in one go, with minimal logical splitting, and that's that. The Mercurial repos I share with others are full of useless (to me) merge commits because no one can bother rebasing as rebasing is not as easy on Mercurial.

With git I can spend the effort of making clean, logical commits and rebasing so that no one need see pointless internal-to-my-way-of-working commits. I can do this because git does not hide its internals from me. I know git is a pile of anonymous commits that form trees, with branches and tags as symbolic names that resolve to specific commits in the pile.

See https://news.ycombinator.com/item?id=15910094 in this same thread for more on how the git model is natural.


In the beginning I tried to use git exclusively through command line because I thought that's how real men do it.

But after using gitkraken (or similar products) I will never go back to using git through its command line interface. Being able to see the whole repo structure is just so much more convenient and it helps to resolve most merging problems much faster. git guis all have their own problems, but overall using a git gui for me is the better choice in 95% of all use cases.


Yeah being able to see what you do is critical for understanding and feeling comfortable. I do everything via "git", but always have "gitk" open for that visual overview. gitk is very basic and not that great, but does the trick.


I always have both open. Gitkraken for having a broad overview of local and remote branches, stashes, and for staging chunks of code. And the command line for everything else. Some things just confuse me in any GUI Frontend or don't work as fast as on the CLI.


This is what I do too. I like looking at my entire staging area commits using the CLI, but merging makes much more sense using GitKraken. Version control is just too complicated of a problem (for me) to reason about in one 'view', and using different tools to look at separate scales of a repository is the way I've found to work through that difficulty.


First time I hear of gitkraken but why would a Git repository viewer have me create an account!?


The reflog is an invaluable feature, I'm glad that I learned about it right when I started learning how to use git. Knowing that there's little I can do to permanently mess things up has given me the confidence to just try things out, speeding up my learning tremendously.

I agree that the default git porcelain leaves something to be desired. Personally I use and absolutely love magit, slightly tweaked to give me the defaults I want.

The staging area is one of the things that magit really improves on; you retain all its power while making it very cheap to use -- one keystroke to stage the selected file, or all modified files. I stage individual hunks à la `git add --patch` a lot, and sometimes need to include only part of the hunk I'm presented with. It wasn't uncommon for my editing the diff to result in an error with the git cli, now I select the lines I need and just stage those. And if you really don't want to use it, `--all` is easily set as a default for committing.


While I appreciate and agree with the last section regarding replacing GitHub's "fork" model with something lighter-weight, it amuses and saddens me that the author puts a lot of effort into describing something that they feel is wholly novel... but is actually just the model already implemented by Gerrit (https://www.gerritcodereview.com/). There are alternatives to GitHub already! Some do things better than GitHub already! Give it a shot.


A lot of this I can't really speak to, and for some things I can see the point - I use stashes to follow almost exactly the no-staging-area workflow - but in other places this suffers at least a little from being a generalization from one example.

In particular, named branches significantly reduce friction in my day-to-day work. I often have PRs for multiple smallish tickets in flight simultaneously, working on the code for each ticket while I wait for review on the others. This means that branches generally live for about eight hours and I care about maybe three of them at any given time, a different set every day. Invocations are stable, named refs fit in cache and my CLI does fuzzy LRU to improve that further, and I can bounce to the current code for a given ticket/PR at almost the speed of thought. By contrast, the "view the log and copy-paste a hash" workflow makes me cringe - I have to inspect the log every time I want to bounce to the code for a PR, and the invocation to go to a PR changes four times a day? I agree that that that makes sense if you have thirty long-running PRs open at any given time and you don't have brainpower to remember branches anyway, and that's very reasonable for a big open-source project, but that's not all use cases.


I was holding to that famous UNIX mentality for many years, since I started programming almost... Use many programs that do one single thing, pipe the data, print it on the main output etc etc. That’s how I used git too, writing 160 chars wide commands... I tried playing with Emacs year and a half ago, and it was big cultural shock, for first year of using it I still couldn’t adopt kitchensink philosophy. But after barely scratching what emacs had to offer I went back to my unix terminal workflow it felt clunkier than ever.

Takeaway from this story can be Magit. It totally changed my perception of git, and felt like more robust and semantically better way to use/understand git. Maybe it was like that because of few years of terminal git usage, don’t know how will complete beginner react to magit. But what shocked me the most was how my approach to git changed with dofferent interface. GUIs never appealed to me, but magit represented the middle ground of GUI like functionality with terminal like interface that got many things semantically right.


<edit> TD;DR: Magit is a lesson of UX. </edit>

+1 for Magit. When I discovered it is when I discovered git. I assume tig must be similar although I've never looked into it but...

With magit, from my editor, I can easily check: my git status, choose which hunks to stage or unstage, commit easily. I can switch branches with less than 10 keystrokes. etc. But to me, what's coolest about it is that I can easily see commit histories and graps for branches, files, and the repo. With less than 5 keystrokes. And I can navigate these histories easily. Finally !!! Same story for pushing, pulling, merging, rebasing.

Which all of this you can do in CLI obviously. But Magit makes it easy, and because now I have everything I need with regards to git just a few keystrokes away, I can interact with git easily. So I do it.


My only beef with magit is that it's really slow for nontrivial projects ($dayjob's main codebase is 1MLOC and has ~110000 commits reachable from HEAD, "status" takes a good 5s, and adding/removing stuff from d/u or d/s quickly gets painful).


I definitely recommend trying tig if you want a nice curses TUI on top of git.


Magit is great! Even if Emacs is not for you I'd still recommend giving only Magit a try. It transformed my workflow from staging and committing full files in one step to using the staging area with parts of files or single lines if necessary (e.g. leave out work in progress parts of a source file). Some other stuff like interactive rebase I could only grok via Magit which in turn helped my Git-CLI skills.


The only reason I am urged to use git is for it's remote branch deletion (that mercurial frowns upon) and Gitlab (posh UI) that lacks with any mercurial hosting solution.

If hg supports to strip a remote repo, it would be just fantastic!

hg evolve is awesome and topics are fantastic, but a `hg strip <remote-url>` would remove the need for any of these and help me keep my simple histedit, strip, push loop intact!


1. The problem with `hg strip` is how it is (and has to be) implemented, because of the revlog format. Basically, revlogs store changesets in chronological order. Stripping a branch means first saving all commits that are not being stripped (but occur chronologically after the first stripped commit) to a bundle, then truncating the revlog before the first stripped commit, then restoring the commits from the bundle. This can already be an expensive operation locally, but it's even more of a problem on a server shared by multiple users who may have been pushing their own commits.

2. The general recommendation would be to use `hg prune` (part of the evolve extension) instead. Pruning commits will just hide the pruned commits and pushing will then send the obsolescence markers for those commits to the server, hiding them there, too. This is an append-only operation, so it's cheap and works well even with multiple users.

3. In general, though, it is a problem that Git and Mercurial treat remote branches/repositories as second class citizens. This is something that Bazaar got right: Bazaar abstracts over the storage, so (except where limited by network performance) you can do pretty much anything on a remote branch/repo that you can do locally.


Could that be added as an extension? I don't know how extensions can work with remotes.


I like what this guy has to say so far, but it's a bit surprising that someone with his knowledge got this wrong:

> The Git staging area doesn't have to be this complicated. A re-branding away from index to staging area would go a long way. Adding an alias from git diff --staged to git diff --cached

I dislike the mess of crappy synonyms too, but git diff --staged is already an alias, in fact it's the primary way I diff the staging area because it's a well fitting synonym.


> I dislike the mess of crappy synonyms too, but git diff --staged is already an alias

Problematically it's very little used, it's only mentioned once at the very tail end of the `—cached` description, `--cached` is still the primary flag, and the essay remains correct that references throughout the documentation are inconsistent.

It's not exactly hard to miss, if e.g. you learn git through the Pro Git book, the option is not mentioned once, and even the full text search has not indexed it.

Furthermore, there are many other commands which take a —cached flag but don't have a —staged alias (diff-index, submodule, check-attr, apply, rm)


One small improvement would to make sure that all the options with "--cached" also have "--staged", and make sure that all the documentation uses term "staging" (deprecating "index" and "cached"). The git UI is historically terrible, but it's gotten a little better over time... it could definitely be improved further with a little more effort.


This article can be summarised as: 1. Cache invalidation 2. Naming of things

They seem like straightforward problems.


Could you elaborate for those of you who don't quite understand?

I understand the reference, but is there more to this comment than a joke?


Sadly not.


>> And the Git staging area should be an opt-in feature.

Thousands of yes! `git commit -a` just doesn't add the untracked files it is the most annoying misfeature of any version control system. Oh, I added unwanted project directories? I would then just remove them and put them to `.gitignore`, or create a `.gitignore` prior to `init`.

Of course I could just alias `add -A . && commit -m` on every machine I ever connect to for developing. There's a great, practical solution.


Automatically adding untracked files would make it extremely easy to commit (and without other safeguards, push) unwanted changes. Annoying things like node_modules or binary screenshots that now bloats the repo history forever (unless one rewrites the history), or potential security breaches like passwords/keys or logs/configs with confidential information.


Then there is .gitignore_global for your node_modules/


Of course I could just alias `add -A . && commit -m` on every machine I ever connect to for developing. There's a great, practical solution.

Maximally practical or not, if you don't already have a repository for configuration files, shell scripts, aliases, and other little quality of life enhancements then create one today. Your git workflow isn't the only thing that will benefit.

Next time you're hassled to do yet another "git config --global user.name" compare it to the hypothetical overhead of typing "git clone foo/homedir.git ~/homedir && ~/homedir/prep" to get both your name and aliases setup.

I never thought I needed a dotfile repo, but once I made it I realized I waited way too long. It doesn't have to be anything fancy, just take 10 minutes and lay the groundwork to help yourself develop more quickly.


I did try this, but it comes with the extra work of managing yet another git repo, which you conveniently ignore. It requires workarounds such as making a bare repo, turning off showing untracked files, then you have branches for multiple computers and it's all a mess in no time.

Perhaps there is a better alternative to saving and loading your dotfiles?


I seem to recall that Visual Studio Could has an option to immediately push all your commits to remote, bypassing staging. I kind of like staging tho; I find it helps keep me focused on small, self-contained pieces of work, instead of working on several pieces at once.


Wow, usually criticism of git is attacking strawmen. This one is actually reasonable and I agree with it.


I've just never found a reason to move on from SVN, to be honest.

I use GIT when I am forced to, but for everything under my control, SVN works the best.

I was unaware you could get rid of the staging area in GIT. I may need to look more into that.


Honestly I find Mercurial to be a good alternative. It's a lot more intuitive than Git, but with the same basic benefits (local, complete history, more portable, etc)


Is today's Subversion fast enough? My first impression of Mercurial moving from Subversion was that it's much faster compared to Subversion (I know, Subversion over HTTP at that time was nothing more than WebDAV :-); I'd migrated all my repositories at the day 1. I later noticed that it noticably slows down with larger repositories and that became a good reason to try Git out (only to be displeased with its interface, though), but even Mercurial was "fast" enough to displace Subversion for me.


The speed of Subversion depends heavily on the hardware involved. On an underpowered server and/or over a WAN/internet connection it will be slow. On a fast server in a good LAN it is so fast that the speed difference to git or hg does not matter in practice: you will spend more time entering commands than waiting for the tool to process them.


Git is over complicated for most people’s use cases, but I wouldn’t give up the fast branching and the rebasing features.


GIt is only as complicated as you make it. You don't have to use all the features :)


No. That would be like handing a professional compound bow [0] to a newbie instead of a simple one [1]. The note "only as complicated as you make it" would be quite sarcastic in this case.

[0] https://outdoorsexperienceonline.files.wordpress.com/2013/07... [1] https://www.archery360.com/wp-content/uploads/2016/05/tradit...


Apparently SVN has something like that now.


I have do work with svn quite a lot. It's simpler that git in some ways, but things like merging branches just does not work at all.


Yes, I've seen that pain, but my workflow rarely needs branch merging, so I've avoided that issue.


I do like SVN, and have used it for more than 10 years. It's simple, and I like simple.

But... I do seem to get a lot of annoying conflicts that require resolution though. Git just works, though does having a bigger learning curve


Here are the commands I use:

git init

git clone

git checkout

git commit

git commit -m

git commit —amend

git rebase

git add/rm/diff [—cached]

git push

git branch

and a few more I can’t remember exactly.

I try to keep my git workflow simple. The most complex is probably checkout abd rebase with several different branches.


You are already using ammend and rebase, you can do plenty of complex and hairy stuff with that, especially as a team. I'm also guessing you use merge and log at times. FWIW this is my philosphy with git too, but have you not experienced the points he brings up? I was nodding my head at all of them; branches are time-consuming, staging is problematic and forks are a bad analogy.

The recent post and hype about the Facebook, Google and Microsoft monorepos should make us ask the question what our systems for source control is supposed to do really.

  save changes to a source code
  save changes to big binary assets
  annotate those changes
  annotate the files themselves
  tagging versions 
  signing of code
  code review
  handle regressions testing
  collaborating (ACL, DCVS, official monkey patching etc)
  merging
  search on the whole code base
  refactoring on large code bases
All these things have uses but they are not really supposed to be tied to one VCS, we have very basic needs when working with source code streamlining the UX for that workflow might be more important and in the same time opening up for easy access to the other stuff.


I especially use:

- git add -e

- git rebase -i --autostash <upstream>

- git checkout -f -- <file> (to undo changes)

- git diff/log (of course)

- git reset HEAD^ (to undo the head commit but leave its changes extant in the workspace so I can then git add -e and commit them in logical units)


"I try to keep my git workflow simple," and yet I see 'git rebase' but no 'git merge' in that list.


`rebase` is simpler than `merge` in larger teams/projects, as the history will be much cleaner.


History is only "cleaner" because Git in its default configuration only throws the raw version graph at you and fails to visualize it in a readable fashion. With a properly structured visualization, such as Bazaar's hierarchical logs, merging is not just as clean, it actually carries more information.

In hierarchical logs, a merge commit stands for the series of commits that are being merged. You can then unfold such commits and view the series of commits as a nested list of commits (which may again contain merge commits).

Think of a merge commit as the equivalent of a procedure call where the procedure body is the series of commits being merged.

(Note that there are other ways to visualize version graphs with merge commits; this is just the simplest way to do it and could actually be easily added on top of Git.)

Rebasing has two problems. It discards the original version structure and it can create commits that never build or don't pass tests (because they never existed as such).

Frequent use of rebasing is almost always an indication that a VCS is lacking important functionality.


> it actually carries more information

None of that information is too relevant. If you have the commits individually cherry picked in your stream, that's most of the info you ever need about those commits. There is some earlier version of those commits in a different stream, where they look different. Usually, who cares.

The way to track that is some sort of additional meta-data. An example of this is the Gerrit Change-ID.


Rebase is not simpler in any context - rebase rewrites history, which, if branches have been pushed to remotes, then necessitates force pushes, which in turn breaks any other instances of the same branch. By using rebase to "keep history clean" you are largely undermining git's power as a DVCS.

Rebase as a tool is not inherently bad but it is definitely not simpler than merge - it introduces additional considerations, requires a deeper understanding of git for effective use and is a dangerous tool in the hands of people who do not understand what it is doing and in my experience most teams that are using it as a core part of their workflow are doing so for the wrong reasons (generally because of a fundamental misunderstanding of how branching and merging works in git and why it works that way).


That is not correct; rebase per se doesn't rewrite history.

Rebase is basically just cherry picks. You can rewind a branch and then cherry-pick, so that the picks are non-fastforward. That's rewriting history. Cherry picking without rolling back is fastforward, and so doesn't rewrite history.

The Gerrit review system on top of Git is based on cherry-picking; it doesn't rewrite history.

If some developers are collaborating on a feature which is on a branch, they could agree from time to time to rebase that branch to a newer mainline.

Whether or not that is done by a history rewrite simply hinges on whether the same branch name is used for the rebased branch or not. The rebase is what it is: if you plant that rewrite as the original branch name, then you have a non-fastforward change. If a new branch name is made, then it isn't a rewrite.

Merging a feature branch onto a trunk can always be done in a fastforward way using rebase/cherry-pick.

Rebasing is provably simpler than merge. Merge depends on complications in the git object representation which could be removed while rebase remains what it is. If you only ever rebase, you never see a commit with multiple parents; the feature is superfluous and turns git histories into hairballs.


AFAIK rebase always rewrites history. If you rebase your feature branch, you have at least rewritten the history of the local branch even if you then delete it and push to a new remote branch. The trunk will subsequently get a FF merge.

Rebase might give you a "simpler" end result in terms of what the history looks like but conceptually it is much less simple in terms of its mechanism and its implications (e.g. rebasing a branch with multiple contributors screws up the audit trail as it now looks like they made their changes at a different time and in a different context to when they actually did) than the idea of a graph with two branches and a merge commit.

I have seen teams with limited git experience switch from habitually rebasing public branches to accepting merge commits and suddenly cure a whole host of workflow problems.

If you can rebase directly onto a fresh branch (not something I've seen) then I am fairly sure that that's not part of your average workflow - establishing new branches every time you want to update from trunk comes with its own communication overhead too.


Simple or not, it's a subjective. As you said it:

> > it introduces additional considerations, requires a deeper understanding of git for effective use and is a dangerous tool in the hands of people who do not understand what it is doing

I am taught git with rebase (though merge when I use hg) so I am already too used to it. Branching then rebase is not that difficult if you keep your workflow consistent and quick iteration. You should be up to date whenever possible. Amending and rewriting history/comment also requires using rebase.

YMMV.


mercurial has "phases", which means that it allows free rebasibg and history rewriting for changes that are still not public.

As soon as a tree is pushes, those changes become public, and you need to explicitly force the history rewriting operations.

This is part of the " safe defaults" approach of mercurial.


> I try to keep my git workflow simple.

> git rebase

Yeah... :)


I often rebase my local commits on a remote branch to keep my branch up to date with changes in the remote branch.

I find this simpler and cleaner than introducing a series of merge commits in my local branch.


A merge should always be done by rebasing the to-be-merged material onto the target-of-merge branch. This way not only is the history linear (and thus understandable, rather than some indecipherable "git log --graph" ball of yarn) but the individual changes are properly re-worked to fit into the target.

When we go back and view those changes, they are all in terms of this codebase, not the original changes that do not apply cleanly to this codebase. That's a big problem with git merges: the merge itself is one huge jump that collapses all the changes. The original changes are traceable in its lineage through the second parent, but those are the verbatim original changes, not the merged changes.


I believe a fork of git should be made which entirely eliminates the concept of multiple parents from the representation of a commit. In the same breath, it would drop support for git merge and git log --graph.


In practice I have found the biggest missing feature that I wish existed is the ability to pull a single file from a known ref on a remote server. For example, give me the Dockerfile located in the root of the master branch tip. Can't do that. Instead, you have to clone the entire repo first.


Well you can clone with --depth=1, so performance wise it's not too bad at all, but yeah still gets all the other latest files.


While many of these criticisms are valid, they seem to be written from the point of view of a user that doesn't really want control over their versions, they just want versions.

I'd suggest that there are other tools for this, such as a properly configured ZFS setup which are atomic beyond most users widest dreams (especially if all the versions of a vim .swp file are retained).

If you just want versions, yes, git gets in your way, because git is about control. Yes if you have all those versions you could go back and 'squash' them into a commit, but can you imagine trying to bisect those to find a bug?!

Somehow I also get the feeling that this perspective is similar to that of a friend who really didn't like the idea of pulling, and that he should just be able to push to the remote repository. A seeming lack of awareness of the collaborative side of version control and development.

Git isn't just about control, it is also about providing additional provenance information and context needed in order to understand what code does and the reasons why changes were made. Sometimes in order to get there we have a bunch of low intention commits that we use to checkpoint, and let's be honest, how many have created commits that put a tree in a broken state? I know I have.

The bit about making the documentation more consistent in its use of vocabulary seems like something that will have a couple of pull requests by the morning.

tl;dr If you want versions use ZFS if you want control and communication use git. Talking defaults is always a good exercise. with and EDIT: show work would be amazing ...

edit2: I just don't see the workspaces model passing the bag of dicks test. I want to code, I don't want to have to become a community moderator and clean up people committing all sorts of crap to my project and on the other side I just wan't to code and not have to deal with the bags of dicks who are going to gate keep their projects. There are just too many social edge cases and the differences are mostly just semantic (iirc the way github implements forks is basically as workspaces anyway...). That said, I would really like it to be possible to have pull requests be part of the repo history instead of, say, the listserv.


Well said.

> Most people see version control as an obstacle standing in the way of accomplishing some other task. They just want to save their progress towards some goal. In other words, they want version control to be a save file feature in their workflow.

That was horrifying to me. We _religiously_ have a JIRA-XXX tag in our commits so that every commit has some kind of perspective in addition to the comment. This has been extremely useful in getting more context of why something was changed when dealing with a long lived project spanning multiple developers. Even if the commit message itself is weak or lazy we still have the user story in JIRA to get full context.


Workspaces please, now.

That would be so much nicer than the current need to fork.


Are you familiar with the 'git worktree' command? It's rather new, and it might be what you are after.


This may be sacrilege on here, but at my previous job I really enjoyed TFS. It's been awhile but the workflow felt a little more intuitive than git. And some of the merging features were nicer.


TFS is great if everything in your codebase is a Visual Studio project. If you are trying to use it for other things, there are better version control systems out there.

My ranking is Perforce > Git > TFS > SVN. I haven't used Mercurial. I like Perforce the best because of its clean UI, its explicit checkout mechanism (all files are read-only by default, and you have to check out anything before you can edit), the integration tool GUI, and the ability to cloud-save work without submitting it. TFS is nice because it has built-in code review, but the UI isn't nearly as clean as Perforce.


I used to work with TFS but after working with git I can't stand it anymore. But it shows that people have different tastes.


Strongly echo this sentiment. Last time I used TFS was probably in 2011 so things might have changed significantly since then but off the top of my head things I use to hate but now don't need to worry about since using git over TFS:

- Offline? Looks like you'll have to wait to make that change or any other change for that matter since you can't check out files if you're offline. When doing remote work this use to do my head in.

- TFS On Premise putting readonly locks on every file you have. Why? Great question. Made writing build scripts and their ilk much harder to accomplish.

- Want to make a branch? That'll be an entire copy of the source code times however many branches you want


Add to that detection of new files or deletions outside of visual studio. This is so easy in git and a real problem in TFS.


I've worked with TFS and SVN for more than 10 years, and Git for maybe 5.

I find TFS the slowest of the lot, by quite a large margin.


Git staging area is nothing compared to Mercurial's branch/bookmark ridiculousness. And the need to use plugins for things that Git trivially supports out of the box.


[flagged]


Unfortunately this is simply the reality for many developers.

I have worked with multiple people whose commit logs would look something like...

> e96ddd0 update code

> 65c3072 update code

> dd9ccc1 update code

> 7992ef8 update code

> 6c536e6 update code

> ...

Over and over several dozen commits. Which is technically fine if they know how to rebase. The overwhelming majority of my commits are simply 'git commit -a -m "CI-WIP"` (check-in work in progress), rarely do I realistically revert to any of them.

Unfortunately, a lot of developers aren't willing to learn the 4 or 5 basic commands which make for a clean git history.

And when I have to work with them, I really would prefer if git had something like `git save`. Which in the backend kept doing something like `git commit -a --amend`.


> And when I have to work with them, I really would prefer if git had something like `git save`. Which in the backend kept doing something like `git commit -a --amend`.

This works fine, but it encourages squashing all your changes into a single commit, rater than providing you the opportunity to build a series of commits.

Instead of `git commit -a` I have gotten into the habit of `git add -p` followed by `git commit` and breaking up my current WIP changes into themed-commits. Then later, I perform an interactive rebase and clean up my series (reorder, squash, reword, edit, etc.).

I think, as you and others in this thread describe, the pain is that forming these lightweight commits is cognitively expensive -- but like all things, with experience and practice it becomes nothing at all.


If you use `git commit --fixup=<commit>`, then the new commit will be automatically moved, and given the "fixup" action when doing an interactive rebase. (Same for --squash).

This means you only need to match the fixups to commits once, when you're initially creating them.


Yes, but I rarely use this as I often haven't made up my mind regarding how the series should be structured.

If instead I could fixup using a tag or theme identifier, which would be used to reorder and group commits, then I would spend less time in my rebase.


> I think, as you and others in this thread describe, the pain is that forming these lightweight commits is cognitively expensive -- but like all things, with experience and practice it becomes nothing at all.

A person can only control their own behavior. :)


You can disable pushes to the master branch and (politely) refuse to merge bad commits.

Bad when I used darcs we used to give everyone permission to push to the main repo, but we'd aggressively unpull patches from the server that were bad.


With Mercurial it's all like this. Plus merge commits referencing others' branches full of this sort of history.

History is rather useless if it isn't clean.

Suppose you have a bunch of maintenance releases, and you have to backport some commits occasionally. (Sometimes you realize that you have to do this long after the commits are written.) If those commits are logically organized and minimal, then it's easy to cherry-pick them from the mainline onto the maintenance release branches. If not, then you're in for a world of hurt, and ultimately may have to replicate the relevant bug fixes by hand.

Clean history is critical to a large project, and especially to a large project with maintenance branches.


And that people, is why all good projects do code review.


The thing with git (the way I see it more and more while working with it) is that it barely cares if it makes sense to the 'lay user', it instead cares about wrapping the most fundamental operations on text files. And that's a good thing in my book. I would not like seeing all this ruined just because it was 'zomg too confusing, not useful for most users'.

As the author intimates, most of the problems with git are problems of UI - the UI is simply not very good, it's confusing, inconsistent and too intimately tied with the implementation. This can and should be fixed.

This is a really interesting article, even if you don't agree with it, and the author has a deep understanding of version control systems (works on hg). There's no need to be dismissive of it.


Commit messages could just as easily be a policy requirement instead of a tool requirement. Maybe all the repos you interact with require a message for every commit, and that's fine.

But maybe the default policy should be permissive. People might decide not to use any tool at all if this is the straw that broke the camel's back for them, and that's a worse outcome in my opinion.

And, honestly, 90% of all commits I've seen use crap messages like "stuff", "fixes", or "change 357" that are so devoid of useful content that I'd argue that those commits would be better off with no message at all. Just date and author. At least then you won't be distracted by contentless messages that can't even be self-consistent most of the time. This isn't a swing at those people, I'm just being realistic.


I would rather go the other way. Commit messages should be modifiable by themselves. The idea that commit messages will be gotten right on the first try predates CVS and it’s just wrong.

Who hasn’t put the wrong bug number on a commit? You’re stuck with that. Who hasn’t confidently remarked that they have solved a problem and then discovered they haven’t? Commentary about a block of code needs to be a living document every bit as much as the code it refers to.

Two years from now when you’re trying to figure out what the hell you were doing in a block of code, the message is going to be wrong and everything you’ve thought about it since is gone, except what you can remember.


How do you like git-notes?


For a second I thought you found something I didn’t know about git but then I realized I dismissed notes out of hand. They’re not managed automatically so it’s pointless for me to try to sell a large team on using them. We already work with multiple repos and doubling the workflow won’t be appreciated.

And if I recall isn’t it a pain to add additional notes?


"Change 357" could be a sufficient message if there is some other tool to track changes where I can look up what #357 is all about.

I'd argue that for any piece of software where a failure requires a root cause investigation the VCS history is a critical artifact and a great deal of care should be taken to make it easy to understand for what reasons changes happened and how they interact.


There are a lot of very sophisticated things you can do with and to a body of code to fix extremely complicated problems, but the information needed to do so is at times quite delicate.

The overlap of people who believe in Bad Luck and don’t care about transparency is pretty high. Things just are, there’s no one to blame and certainly no lessons to be learned.

They don’t get how a breaking the commit history with a mangled move operation or shitty merge will fuck is over eighteen months from now. Hell half of these people think they’ll be retired or working someplace else 18 months from now anyway.


> "Change 357" could be a sufficient message if there is some other tool to track changes where I can look up what #357 is all about.

Realistically, though, you won't be able to figure out to which task management system this issue number refers, out of the five that your team used in the last 3 years. We have been through two different JIRAs and two different Github trackers in the last 3 years. Sometimes multiple simultaenously.

Also, whenever you write /#\d+/ in a Git commit, GitHub will stubbornly assume that the number refers to Github issues and PRs, even if it doesn't. And some of our repos use both Github issues (for external input) and JIRA issues (for internal planning).


If you switch issue trackers every ten months without a migration strategy for old issues that's a pretty serious organizational problem. I hope that's not a mature company.


I think in feature branching workflow it would be more useful for the branch to have a message not the commit. Commit is way too small of a unit to be meaningful more often than not.


> The general sentiment of this article is in things like:

> > A commit message is already too annoying for many users!

You have misrepresented the article. It actually says:

> [Commit messages] can be annoying. But we generally accept that as the price you pay for version control: that commit message has value to others (or even your future self).

Maybe you misunderstood "generally accept"; it could be taken to mean "this is what people normally think (but actually they're wrong)". But it's clear from the context he doesn't mean that: it's saying that commit messages really are worth the effort (but it's OK to admit that they do take some effort). It's just context for contrast against saying that a separate staging step is not (always/by default) worth the effort.


> So because your ass is just lazy another guy a few months down the road has to suffer (in this case decipher what it is you wanted to do with your changes)?

The problem that Greg is referring to here is that `git commit` (as for many other version control systems) overloads two distinct operations: checkpointing your work and creating a new revision. Forcing you to provide messages for the former use case can even be counterproductive [1].

Contrast this with, say, Smalltalk's ENVY: saving a method automatically versioned the method (without the need of providing any other metadata), but you could also separately create system snapshots as named versions.

Common Git usage is to squash checkpoint commits, anyway, with individual commit messages often disappearing in the process.

[1] https://xkcd.com/1296/


... so have a policy of all your "checkpoint" commits just having "checkpoint" as a message. Or ".".

Forcing a commit message is a good default, because it's a good idea to nudge developers towards properly expressing what they're doing. But if you think you know what you're doing, it's easy enough to work around.


Checkpointing is generally different from commits in more than just not having a commit message (ex: should one allow checkpoints to be pushed? should they be suppressed in the log by default? should they be part of the normal revision graph or a separate hierarchy?).

Yes, you can emulate them because Git is a general versioned, hierarchical key value store. But by the same token, you don't really need Git, you could just store each new version in a separate directory. A version control system is about more than providing raw access to a storage engine; it needs to support appropriate high-level operation and integration with the development process.

The problem here is the Procrustean way in which people try to force everything to fit Git's model rather than to think beyond its limitations (many of which are shared by other VCSes, I don't want to single Git out here) and to address these problems at the user experience level (especially when we're talking about normal users, such as writers and artists, who may not have a deep understanding of the technical details).


You can use Git for things other than open source software development.


Reading this article is harder than learning how to use git the way it's intended to be used.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: