Hacker News new | past | comments | ask | show | jobs | submit login
How to Write a Git Commit Message (beams.io)
202 points by pwg on Sept 13, 2015 | hide | past | favorite | 89 comments



The biggest problem with commit messages: They lie.

The main reason for this is that most people tend put more than one atomic change into a single commit, but in the message they often only mention the "main change" which they introduced with the commit. This in turn makes it much harder to find out which commit introduced a given problem into the code base since people will usually read the (incomplete) message of a given commit and think "the problem can't be here because this commit only did X" while it actually did Y & Z as well. So, personally, I think we could as well abandon writing commit messages entirely and instead make sure that individual changes are small enough so that we can figure out what happened by looking at the actual code instead.

As an example, if you look at individual commits in the Django project (such as this one: https://github.com/django/django/commit/e7e8d30cae9457339eb4...), it is often much easier to figure out what happened by looking at the file diff than the commit message.

A tool that could summarize changes in source code using advanced (semantic) diffing instead of line-by-line diffs would make this much easier of course.


With magit, it's easy to only stage individual 'hunks' so that you can have only the relevant parts of a file staged. And so this problem goes away.

I also imagine it's easy to do this in fugitive as well.

It's probably also possible on the cli, but I imagine that would be too 'hard' or time consuming, hence you will probably start to be lazy about doing it correctly.


>It's probably also possible on the cli, but I imagine that would be too 'hard' or time consuming, hence you will probably start to be lazy about doing it correctly.

I do it all the time, you just use `git add -p` and it steps through hunks letting you stage them or not. As long as you don't make too many changes without committing, it's not particularly tedious.


This was the 'killer app' that compelled me to switch to Git from Subversion back in the day, rather than anything else about Git intrinsically. The ability to organise my commits logically rather than temporally (i.e. not just as a stupid log of change over time) was like night and day.


> ability to organise my commits logically rather than temporally

I'm just getting started with Git; how do you do that? I googled it but nothing jumped out at me.


He referred to what the GP said - use "git add -p". Here's some introduction (I haven't watched it myself and can't comment on the quality):

http://johnkary.net/blog/git-add-p-the-most-powerful-git-fea...


I recommend setting up a shell alias to do this. I have all of my frequently used git commands aliased to two- or three-letter mnemonics.


Sweet! I never bothered to check how to do it on the cli, and instead of being interactive, I just assumed you would have to reference the hunks by line number or something. Naive of me.


It also works with reset and checkout, BTW.


Wow thank you. I had no idea that was something you could do!


You can simply use `git add -p` to stage hunks individually (or `git reset -p` to unstage some hunks).


"git add -p" can be fun, but it's a little scary because it's super easy to end up making commits that don't actually stand on their own. Super simple to miss an import here or a new field there. If you do a whole series of them, it might actually be worse for others to come back to (or bisect in) if they don't realize the original developer never actually compiled and tested each commit as-is.


Well ideally you could use `git stash -k -u` (-keep index, stash -untracked) to set your working directory to the state you're actually committing, and then run some tests.

For the most part though, I just try to avoid having so many hunks to step through that this is even an issue.


And `git checkout -p` to erase hunks from the working directory. (i.e. delete without saving!)


That's a little dangerous! I use git stash -p instead, and only drop it after I'm absolutely sure.


> It's probably also possible on the cli, but I imagine that would be too 'hard' or time consuming, hence you will probably start to be lazy about doing it correctly.

It's extremely easy. git add -p and then you can select to add the hunk or not.


The UI for this in SourceTree is also really nice. Actually my favorite thing about ST is that I learned so much more about git from using it.


This, in my opinion, is one of the very compelling reasons to use a GUI for these types of operations.

As an example, refactoring something that touches many files often leads to several related/required changes that aren't part of the main refactor. When you first do the change, you're not 100% sure it will stay around, and committing at this point can be a pain later. As a result, you can end up with many files changed and several logical units of work done, and some may be [parts of] a single file, while some may be [parts of] many files. For 5 or 6 hunks, git CLI is usable. Beyond that, for say, a hundred, a UI where you can jump around is basically essential in order to make usable commits.

I know there are still people that snobbily look down on and dismiss GUI tools, but some things lend themselves well to GUI, so I'd suggest giving them a try.

SourceTree in particular works seamlessly with CLI. When I first started with it (being used to git CLI), I jumped back and forth quite a bit with no issue. Now I really only use git CLI for remote branch operations or viewing reflog, and occasionally for a 'git commit -am' if I happen to already be in a shell.


The other thing for me is that git is inherently very very stateful. There's tons of detail to keep in mind as you execute commands - your branch, what's staged or isn't, the state of the remotes, whether you have anything stashed, etc. To me that's a recipe for a tool that should be used through a GUI.


most people tend put more than one atomic change into a single commit,

That's a people problem. Correct it through proper training and mentoring, not letting it continue just because "that's what people do."

Historically, some people were afraid of "wasting" commit numbers (CVS, SVN, mock revision numbers in hg), but git has no concept of an incremental commit number, so you can burn through as many commits as you want without feeling guilting about running up an auto-incrementing counter.


How do you "waste" commit numbers? They're just numbers. I believe the concern is about log space. (And git has a log too.)

Hopefully someone reads that log and they shouldn't be bothered by a thousand trivial changes, the reasoning goes.

Which is true to some extent, it's just that everyone doesn't get it right ... and that's where your comment about mentoring and training comes in. It must also be ok to make mistakes as to not try to hide slip-ups in the next commit.


I have worked with people who are obsessive about keeping auto increment numbers in databases "tidy". It's obviously nonsense but some people aren't logical. The same thing applies to some projects fears of actually following semantic versioning. Numbers are infinitesimally cheap, there should be no fear about burning them.


If you're using pull requests and reviewing them before they're merged, you could make inaccurate or incomplete commit summaries grounds for rejection of the request. Ask them to fix it using interactive rebase.


I'm curious how many of us interactive rebase every patchset before merging? For me it's critical because I tend to commit too frequently.


For what is worth, microcommits + rebase for fast-forward merges are the basis of the workflow used in the GNOME project.

As far as I know, it's more or less also what it is used by kernel people before hitting the tree of a maintainer (and it's non-ff merges from there).


I would think that scanning through commits is a pretty inefficient way to find out which commit introduced a problem anyway.

"git bisect", "git blame" and "git log -S" are my tools of choice.


You don’t need a magic tool to do this - what we need is a way to block commits that don’t have corresponding documentation of the changes. A tool that made you write a comment message for each code change block would get the job done - if you commit 10 changes then you would have to write 10 messages to explaining each of the changes.


> more than one atomic change into a single commit

There is no such thing as an "atomic change". Sometimes, fixing a single bug ,adding a single feature requires the edition of multiple files or even complex changes. I personally don't like these projects with 1 commit per file change ,that's ridiculous and it's noisy.


I think he mean a single feature.

Like a single commit "Fix bug XYZ", that in reality also contains "Fix typo in error messages", "Change rendering of status page", "Fix test framework DSL to prevent infinite loop".

Naively you could say that each of those should go in their own commit, but reality is that they may actually be quite small and necessary and may not even be seen as a feature by the author or reviewer. Only 6 months down the line, you read the code and wonder why the f*ck there is a change in the test DSL in order to fix bug XYZ.


>Sometimes, fixing a single bug ,adding a single feature requires the edition of multiple files

Congrats, that's one atomic change. He didn't say "one commit per file change", he said "one commit per atomic change", and yes, sometimes those atomic changes can spread across multiple files.


Out of all common advice on Git commit messages always using the imperative mood in the title I find the hardest to agree with. Describing your own actions as a committer in the imperative may sound strange, true, but there is the worse problem of making less obvious the distinction between the actions of the system and those of the committer.

In my own projects I use the imperative when describing the actions the system should perform ("Don't warn user about missing cache directory") but use the indicative mood when describing my own actions ("Removed unnecessary cache directory warning"). I've noticed that some projects use the imperative for both cases and I suspect this introduces a degree of cognitive overhead to scanning the commit log. One solution to this problem would be to never "address the system" in the commit title but I find that that is often the shortest and the most expressive way to describe an update.


I think GitHub has broken us a lot here.

Commits, when extracted from the repository, need to stand on their own. Example: sending a commit over email. Then you look at a commit message and you do ask "What does this commit do?" The commit doesn't "removed unnecessary things" because that just sounds wrong—the commit didn't already do things before it existed.

The weirdness comes from describing what the author already did to even create the commit versus what someone encounters when they read the commit for the first time. Your historical actions don't matter in writing results, only future readers matter for what they discover the patch will change.


Agreed. GitHub has really done a lot to harm the quality of commits. Their tools emphasize looking at and commenting on the total diff of the patch set rather than inspecting each commit on its own. I tried commenting on the patches themselves once only to realize that the pull request page didn't show them and they were hard for the author to find. Unfortunately, it doesn't seem that GitHub is willing to change their code review tools since even Linus Torvalds complained about them and nothing important was changed.


Luckily, since GitHub has a pretty extensive API, this leaves the way open for third-party code review solutions like https://reviewable.io (disclosure: I built it). But yeah, considering how long it took them just to put in a split view, I wouldn't expect a whole lot of improvements in the foreseeable future...


it doesn't seem that GitHub is willing to change their code review tools

They only have $350 million in funding. What do you expect, new features?

(SpaceX launched rockets into space with $100 million of private Elon Musk money. $350 million from VCs helps GitHub make... webpages.)


A late update on this subthread (I wrote the GP): I found your comment very insightful when I first read it. That the VCS applies a series of patches that are essentially standalone to the initial blank state is something that gets obscured when you use GitHub and similar tools. (And GitHub is a big influence — for me it's a major reason for using Git a lot more than, say, Mercurial or Fossil.) I have since given the matter more thought and have adopted the imperative mood subject line style for a new project I started.

As for "addressing the system", I have decided for now to not do it at all in my subject lines to avoid the confusion.


I find it personally amusing to notice I tend to take the same approach with my commit messages, while never quite intentionally setting out to do so. Reading your comment has left me mildly introspective, wondering why I do so. I have some thoughts that aren't particularly relevant here, but thought it worth mentioning this is a particularly helpful approach--if only because I default to reading imperative msgs as actions/expectations of the system worked on, and indicative msgs as actions of the committers.


I think you should usually be able to reword any "self-action" commits into system action commits.

For example, you could say "Don't issue unnecessary cache directory warning".

Prefacing a commit message with add/remove/delete seems unnecessary.


So how would you suggest to reword "Refactored invoice parser" for instance? I'm not asking the system to do anything here: in fact, it would be quite unfortunate if it would start doing anything different from what it does already. No, I'm just stating that this part of the system was a mess since its original creation back in Jurassic period and I just cleaned it up.


I would reword "Refactored invoice parser" entirely as that message is about as useful as "changed code".


"parse invoices better" ;)


"Refactor invoice parser"


Have you read the comment I'm answering to? I'll repeat the key sentence for you:

> you should usually be able to reword any "self-action" commits into system action commits

So your suggestion here is completely irrelevant, as "system" doesn't refactor anything.


That's not how I've normally seen the advice to write imperative commit messages get interpreted. Quoting Documentation/CodingStyle from the Linux kernel (whose commit messages follow that pattern):

Describe your changes in imperative mood, e.g. "make xyzzy do frotz" instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy to do frotz", as if you are giving orders to the codebase to change its behaviour.

Or, if you prefer not to anthropomorphize the codebase (because it hates that), you could also think of it as instructing someone to make the change (and then supplying a patch implementing that instruction).

And a quick search through the Linux kernel git log turns up 1416 messages of the form "subsystem: Refactor ...".


I guess the important thing is to be consistent with whatever mood you choose.


Does the majority read commit messages as "If applied, this commit will..."?

I always treated log messages as a history of events, i.e. describing what had happened in the past - this commit "fixed" something, and this one "changed" something, "refactored", "implemented", etc.


The "use the imperative mood" trope has gotten completely out of hand, and the "If applied, this commit will ..." argument is entirely nonsensical. You can slightly adjust the lead-in and make the same exact argument for a different tense, e.g., simply take "will" out of the sentence and you have the same argument for the present tense.

The most important thing is to discuss it with your team and consistently use the tense chosen.


A commit object is a container talking about itself. You don't talk about your present state in the past tense.


A commit object is a container of changes. A commit message is a label placed on that container by whoever packed it. It's not the container talking about itself – it's a programmer labeling something for other programmers to read (in the future) about something they have already done.


That would be the egocentric view, instead of letting the commit stand on its own.


I've generally taken the tone of what the commit does "Fixes widget 22", "Adds foo bar"


Magit for Emacs helps you maintain some of these standards if you enable flyspell for git-commit-mode.

And obviously Emacs is my editor of choice for everything, git commits included.


+1 for Magit and Emacs. They are my not-so-secret weapons.


We advocate also using hashtags and URLs in the commit message body.

Our hashtags correspond to specific areas such as #analytics, #localization, #api. These hashtags are easily searchable by the respective teams, and we also use simple shell scripts that categorize by tag, e.g. "List all the changes related to #analytics".

Our URLs link to the corresponding tasks in Asana, Jira, etc., and also to any relevant background information such as RFCs, ISO standards, NIST docs, etc.


Saw this blog last week : http://ericjmritz.name/2015/05/27/my-global-git-commit-templ...

He uses a template with a few metadata, and tags too. In an older entry he mentions using a special WIP tag for successive sequences of iterative changes toward a fix too.


Isn't WIP what branches are for?


I guess I don't branch enough then.


This is a pretty clever idea. I might start playing around with this in my own projects. Thanks for the idea!


I see this done from time to time as [feature] also.


>3. Capitalize the subject line

>4. Do not end the subject line with a period

>5. Use the imperative mood in the subject line

I can't help but feel that these three "rules" should be removed and placed in a separate list of style guidelines since "breaking" them is hardly going to make your commit message unreadable.

E.g.:

    5ba3db6 fixed failing CompositePropertySourceTests.
would still good enough in my book. I think consistency is actually more important than following those three prescriptions specifically. It only becomes an issue if you capitalize some subjects but not others, alternate between imperatives and past participles, or use different punctuation styles.


"Fix failing $test" and "Add tests" seems like a commit message anti-pattern. A failing test likely has its failure rooted in an actual problem that you could be describing instead, and a commit message simply stating that you added "multiple tests" lacks a rationale for each and every test added.


A commit message should answer the question "why?".

In the article they ask "Which would you rather read?"

I would prefer to read the first commit message they offer as an example:

"Re-adding ConfigurationPostProcessorTests after its brief removal in r814. @Ignore-ing the testCglibClassesAreLoadedJustInTimeForEnhancement() method as it turns out this was one of the culprits in the recent build breakage. The classloader hacking causes subtle downstream effects, breaking unrelated tests. The test method is still useful, but should only be run on a manual basis to ensure CGLIB is not prematurely classloaded, and should not be run as part of the automated build."

That is informative and gives me the information that I need. It answers the question "why?". And yet, this commit message is held up for criticism. And then, this is given as a recent example:

"Rework @PropertySource early parsing logic"

I have no idea what happened in that commit. And this commit does not answer the question of "why?".

I disagree with this entire article. The examples of "bad" commits are the one's that I would want my co-workers to write. The example of "good" commits are the kind that would make me angry with my co-workers.

Verbose commit messages are not useful because they are verbose, but a commit message should answer the question "why". Commit messages that answer the question "why" tend to be a bit longer than commit messages that fail to answer that question. I would much rather read a verbose message that answers the question of "why?" than I would read a short message that fails to answer that question.


Agreed -- when I read the article, I found the first set of commits more valuable than the second. Having short commits is better than long commits, but not at the expense of information. I don't need commits that use the words "updated, fixed, added" -- if I'm looking at `blame` I know something was updated/fixed/added. What matters is how something was changed, and sometimes why it was changed.

Write commits that help your coworkers figure out why the code looks the way it does. If you're writing "update FooBarFactory to fix Bug4" you're essentially wasting my time by using git to narrate the obvious.


Answer to "why?" should be either in commit description, or in linket JIRA issue.


Again, recommending the angular style: https://github.com/angular/angular.js/blob/master/CONTRIBUTI...

As (IMHO) the most important thing is where (out of many modules) is the change and of which kind (fix? a new feature? refactoring? only some docs?).


Most of those rules are good. But don't lose focus on why you are writing commit messages in the first place which is to help some poor programmer many years from now to track down a bug in the code you wrote.

E.g His fourth rule to great commit messages is "Do not end the subject line with a period" But if you think it matters if there is a period at the end of the subject line or not, then you might have a bit of an anal retentive streak. :) Keeping your opening brace placements and indentation consistent, that's important because code is easier to read if it is formatted consistently. So some details matter and some details do not. Periods at the end of subject lines in commit messages is of the latter category.


The page was very slow to load for me. Here's a mirror in case it goes down completely: https://archive.is/LGRwO


I have some problem with this:

> Use the imperative mood in the subject line

Sometimes, you don't have or don't want to link to a bug tracker bug. So it gets unpractical to write something like "fix the window that didn't close at the click of the button". It would be much better to write "Fix: button click did not close the window". This is even more true if you have to write commits in a latin language.

I guess the bottom rule is synthesis and assertiveness should be only considered as a preferred method, not an hard rule.


> So it gets unpractical to write something like "fix the window that didn't close at the click of the button".

Imperative style normally produces more direct messages than that; talk about how it behaves now, not how it used to:

"Close window when button clicked"

Or, assuming "button" here means "close button":

"Close window when user clicks close button"


Yeah, plus, the fact that you fix something is the "why" part of your commit message, it should go in the body. The result could look like this :

> Reduce overlaying div element width > > This caused a bug on some devices : the div would overlay the close button, causing it to be unreachable.

I think that commit message that begin with "fix" are often not very good. Basically, you're saying "It was broken, this commit fixes it", not what was broken nor how your commit is supposed to fix it (yeah because sometimes, people think they are fixing something and actually create another bug in another part of the thing).


Interesting, I'll think about that.


In the past few teams I've led, the requirement was to include a JIRA ticket number in the commit message. With tools like Fisheye, you can click from the commit message to the JIRA and get much more context about the particulars of the commit. It worked exceedingly well. The point is that there is only so much context you can put into a commit message. Sometimes the context lives outside the system and good tools support like Atlassian's suite makes life easier.


A lot of people do something similar on GitHub. If you're going to do this, please include the issue subject in the commit message somewhere. "Fixes #135" is absolutely meaningless outside the context of whatever tool. And it absolutely falls apart if the project lives long enough to change issue trackers.


Much agreed. "Fixes #123" makes sense for pull request messages, but the commit messages should be verbose!


I was in a team where the tooling required a JIRA ticket reference in the commit message. Of course, the JIRA project didn't have an actual ticket for the majority of minor changes and fixes (and for that matter, adding one for every single commit in a 20-commit topic branch was tedious). Eventually the entire team was just adding "ticket 1" by default to every message.

You can't solve a people problem with tooling. It can help, but ultimately if it's a pain point people will simply bypass it.


I've seen this practice very often lead to less atomic commits, what are your thoughts?


When you work on a team that requires code reviews (pull requests) before every check-in, the "body" portion of the commit does not really make sense as the developer pretty much has to include that information in the code review description either way.

Most teams automatically put a link to the corresponding code review in the commit message/body.


I'd still prefer a body with a good commit description, since review descriptions are not attached to the history.

I've been thinking about this a little recently, the conversations that determine the direction of a product are not part of the history, if a repository is shared there's no way to see that information without going through emails, or going through issues on GitHub or Bitbucket or whatever else.


Is this something we can change though?

As you mentioned, these conversations happen everywhere including offline meetings and hallway chats.


On GitHub, single-commit PRs automatically have their body text set to the body text of the commit. (Annoyingly, GitHub doesn't reflow the body text, so it looks "jagged"...)

For multiple-commit PRs, I usually just give some surrounding context and let the reviewers read each commit message separately (with the three dot button).


At work they're consolidating systems, and we might lose our entire code review history. Having some redundancy isn't necessarily a bad thing.


I like to point people to the Linux kernel `git log`[1] when talking about the value of good commit messages.

[1]: https://github.com/torvalds/linux/commits/master


I thought for sure it would be about this recently announced project, which also cited Tim Pope: https://github.com/m1foley/fit-commit#readme


I recently started working on a tool that checks for this and more : https://github.com/jorisroovers/gitlint


What I want to know is how to write good commit messages when you're first building your thing. Most of your code is unwritten and what is written is constantly changed to improve the interface(s).


> and doing the right stuff with git when you rename one

What is he referring to?


I have to disagree on "Capitalize the subject line". If you're using lower case consistently, it's an unnecessary stroke of the shift key.


Commit messages sometimes need to be longer to explain something. When that happens, it's nice for the subject to be the first sentence (possibly with the rest on another line).

  Don't warn the user about foo.
  
  The foo warning is superfluous because ...
> it's an unnecessary stroke of the shift key

This is not a good reason to argue for a style convention. The effort or time of pressing "shift+letter" is no greater than simply pressing "letter" for all experienced typists since they are pressed simultaneously and the decision to capitalize a letter and execute the necessary keypress is completely automatic and fluid. If we were discussing rarely-used symbols and keystrokes then the analysis might be different.

There is great value in following the language's style convention since its readers are familiar with that convention. Upper case denotes the beginning of a sentence. When you scan a series of lines beginning in upper case, there's a visual cue that they're each their own unit, as opposed to (for example) the wrapping of a previous line.

  Don't warn in case foo
  Update bar dependency
  Warn user when baz
These scan better to me as a series of three independent thoughts than the equivalent in lower case, which seems like a poorly written haiku:

  don't warn in case foo
  update bar dependency
  warn user when baz
Personal preference might differ, but my opinion based on seeing many variations is that programming texts such as documentation, commit messages, etc., are most readable when they follow the syntax and style of regular English text, to leverage all of the standard cues that the reader is familiar with. Readability usually suffers when one breaks from convention.

Lastly, following regular conventions gives you the advantage that you can copy/paste text across various scenarios without having to change it. Need to reference a commit message in an email? Well, just copy it in. No need to mess with the capitalization. All things considered I see no good reason to begin documentation or commit messages in lower case given the overwhelming conventions to the contrary that exist in our language.


If your project has rules, follow the rules of the project.

Also, the project should follow standard rules everybody else uses (so everybody doesn't have to re-learn project-specific demands).

So, titles of commit messages are caps and don't end with a full stop.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: