Writing good commit messages is one of those things that IMO make the difference between a dev who produces good code, and a dev who produces high quality work in general.
The git log is one of the main entry points to an open source (or closed source, in fact) project. Following them tells a story, and can help you understand decisions made.
I know which one I'd want to debug. This is also the primary reason why I highly dislike merge commits: They make git logs extremely unreadable.
The thing is that writing clean commits is something that is extremely easy to do. Unlike docs and tests, it's not more work (it's less, in fact) and it's not something to continuously maintain as the code changes. It's an immediate improvement you can make to your development habits that will have a consequential improvement to your QOL. You will catch bugs doing this!
(Tip when writing atomic commits: Use `git add -p`. That lets you stage patch hunks.)
Writing the good quality commit message is not that hard and is sometimes actually enjoyable (after all i get to brag about this feature / bugfix / solution).
What does cause a lot of mental overhead (and consider myself a decent engineer) is creating commits that have a single purpose in the first place.
Working on a new feature I often have to refactor something, and while I am at it, I clean up some related parts. That is at the minimum 3 commits applied in the correct order, sometimes over different branches. At that point there is a lot of `git add -p` and `git stash apply` going on, which takes considerably more mental energy while you see some colleagues getting away with `git commit -a`.
Writing good commit messages itself is the reward for creating good commits.
This is why I work in graphical clients (GitKracken being my current preference). They make interactive staging of individual files or lines utterly trivial.
I do partial commits significantly more often than committing everything, and often do things like a couple commits, checkout another branch (and GitKracken does an auto stash + apply) then commit a separate fix there before switching back to the original branch.
Also there are several other things it makes easier: being able to multi-select commits and see a combined diff, quicky diff between two random branches, and just generally browsing back through history. And importantly, it's easy to do all this without having to remember, copy/paste or type any branch names or commit hashes.
I'd say it's also not as functional. With a GUI you can click around and browse through other files, as well as stage/unstage with the same interface.
With the CLI, all you can do is cycle through "Stage this hunk [y,n,q,a,d,e,?]?" prompts, and if you mess up, you have to exit completely and do `git reset --patch` and cycle through those prompts again (at least, I don't know another way to do that). Actually it's a bit worse because it has even more options:
Unstage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? ?
y - unstage this hunk
n - do not unstage this hunk
q - quit; do not unstage this hunk or any of the remaining ones
a - unstage this hunk and all later hunks in the file
d - do not unstage this hunk or any of the later hunks in the file
g - select a hunk to go to
/ - search for a hunk matching the given regex
j - leave this hunk undecided, see next undecided hunk
J - leave this hunk undecided, see next hunk
s - split the current hunk into smaller hunks
e - manually edit the current hunk
? - print help
If you use this all the time and get used to the commands, and are also good at picturing the separate pieces you are trying to commit in your head, it's probably fully functional.
If you're like me, and come back to your code after an unrelated 1hr meeting (or lunch) and are trying to sort out the 3 or 4 separate changes you did earlier in the day to make nice logical commits... good luck.
> With the CLI, all you can do is cycle through "Stage this hunk [y,n,q,a,d,e,?]?" prompts, and if you mess up, you have to exit completely and do `git reset --patch` and cycle through those prompts again (at least, I don't know another way to do that). Actually it's a bit worse because it has even more options ...
You can get a hybrid of the two if you use commands like recountdiff (from patchutils) and git-apply --cached. I do this by reading the output of git diff into vim, editing diff hunks and running recountdiff on those hunks, and running git-apply --cached. If I mess up, I can always read the output of git diff --cached and run git-apply -R --cached to unstage that hunk.
I find it better than using the CLI menu driven tool that you refer to.
Rather than `git add -p`, I suggest creating a second clone of your repo, staging your foundational refactor changes in the second repo, creating and merging your commits there, and then rebasing your working branch.
This makes sure you can fully test your refractors and that their change sets stand alone.
If you know your way around git well enough that you're not going to be screwing up repository state in ways you can't fix, there's no reason to operate from separate clones - check out git worktrees :)
I use long-running git rebase -i with a line like “x false” to pause the rebase, run tests or finish packaging the refactor, then git rebase --continue.
YES. It's very easy to write good commit messages; what's much harder is making "good" (as in: atomic, understandable) commits. I know I'm just restating what you said, but...I just agree very strongly.
Part of the problem is it's not obvious why you need to do this until you need it, at which point it's too late.
Reasons include: understanding how a change works (all related code in one place without extra distraction), easy to revert, easy to figure out why that code is there during a blame.
> What does cause a lot of mental overhead (and consider myself a decent engineer) is creating commits that have a single purpose in the first place.
That's why I typically write code where I implement a feature and then go through the diff to determine what to stage and what goes into each commit. Trying to do this while implementing something isn't really something that's worthwhile.
Before I recently started coding I used to be a network engineer. I worked with a LARGE team at my previous job so documentation was CRITICAL for working difficult technical issues / documenting troubleshooting / changes / etc. If you don't document what you did, what it does, and why, everything goes to hell fast (sadly some places operate like that). So for me it came naturally that you're responsible for documenting what you do as much as what you do.
Working my current coding job I'm by no means the fastest coder, or the best, but man I get praise for documentation, comments, commit details. It kinda amazes me how many skills that are just straight required in one field, earn heaps of praise in others.
Being thorough and highly detailed is definitely something that I praise with the colleagues I have doing it well. Someone taking time to write good PRs and maintaining documentation is definitely more useful in my book that a fast coder. Good documentation reduces the bus factor by a lot, and make someone a team player in my opinion.
> This is also the primary reason why I highly dislike merge commits: They make git logs extremely unreadable.
I really really disagree. If the branch has one logical commit then no merge commit is needed. Otherwise the merge commit shows what happened and more importantly that these N commits came into the mainline as a single change.
And I'm not cherry-picking... I usually see much, much worse git logs than this from merge-commit users. In fact, if I scroll down, this is what it looks like:
You're moving goalposts. Yes almost nobody changes the default merge-commit message. But that's because, like you, they don't know about --first-parent (or they just don't care about commit history)
That doesn't mean using merge-commits is wrong, that means people write bad commit messages, which you already complained about.
… am I moving goalposts? I do know about --first-parent and it is not intended to be a feature to make git logs readable. It gets rid of nearly the entire git history. At this point, you might as well simply go to the github page and look at merged PRs; that will actually get you something readable.
I fear you may be terribly misunderstanding the point of atomic commits and readable commit messages. Everything within a merge commit will still come up during a bisect, for example. If the commits are non-atomic and/or they are badly written, that is still an issue.
--first-parent hides the problem, and adds a new one. I'm not sure what your thinking is. Once again, what's more readable to you of all the screenshots I linked?
I haven't tested it all too thoroughly, but from what `git bisect view` tells me, this seems to work like a `git bisect --first-parent`. I'm wondering why there is no such option...
For anyone unfamiliar with git extension scripts: You should be able to put it in an executable (`chmod +x`) file (say `git-bisect-branch`) in your $PATH to make it usable (`git bisect-branch $bad $good...`).
> I systematically disable merge commits on every repo I manage.
Do you have buy in from other team members for this?
Also what do you do if master needs to be merged back into develop? Do you really rebase develop and make everyone fiddle with all their feature branches?
There is no develop, master is continuously deployed. Users can create feature branches and PR them but there is just no merge commit, they are pulled in fast-forward. In projects where I need stable releases, I create stable/1.2.x type branches off of master at a point in time. I cherry-pick commits onto it after the first release if there are subsequent ones.
Though fast forward merge commits (where the referenced tree is the same as referenced by the second parent) can be useful because they provide a way for you to group related commits for a feature together (git log merge_commit^..merge_commit^2). There are features that require more than a single commit to implement.
I go one step further - All changes (even single commits) are in a feature branch.
Feature branch is prefixed with issue number. eg. PROJ-1234-update-ui-to-do-something. That ensures that every commit is logged against an issue.
If you list --first-parent and show the list of merge commits, you see a list of features which were implemented into the dev branch.
If you need to drill down into lower level changes of a feature, you can look at the individual commits that were made in that branch.
That way merge commits are atomic commits of features, and commits in a branch are logical commits/steps for an individual feature.
And in answer to the original post - I try to make all commits have a message of what was the type of change, what the change was, and where it was. eg. "Added a clear button to reset all form elements on the update address page". The general rule is, you should be able to read the commit message and know exactly what was changed and where. Compared to "Added clear button".
I am using this method to a great success at my current job. It leads to always having a clear, well-defined place where you look for changes related to issue XYZ.
Support from tooling also helps a lot, e.g. "checkout branch of issue XYZ".
This is IMO a potentially good system but it requires a lot of rigor. In my experience, such systems break down unless they are rigidly enforced, and that's never fun for anybody.
1) Create a post-commit hook script that only allows commits into feature branches.
This prevents accidentally committing into the dev/master branches (it still allows merges).
2) Create a script file that all developers use to start / end features which handles all the branch creation, pushing and pulling. That way the process is standardised and you cant make a mistake.
eg.
* script startfeature abc123-this-is-new-feature
* script endfeature (this automatically picks up local open features)
The only time there is ever an issue is when the end-feature causes a conflict. This is solved by rebasing your local branch, fixing up any issues in your commmits and then ending it again.
I fail to see how that would be useful. Commits are supposed to be atomic, self-contained changes by themselves. More often than not, I see merge commits only when people abuse commits for temporary state of work (with useless commit messages like "fix" and "temp"). Instead, if each commit did one thing and one thing only, why would you need a visual indicator that these set of commits came from a particular branch?
Because they are logically grouped and that grouping is often more important than the commits themselves -- together, they accomplish a higher-level goal.
You want both that goal and the individual tasks which achieved it to be visible.
Not sure I follow your argument. How would you put information about the semantic meaning of a series of commits into "the commit body"?
The only point I'm making essentially is that "the commit" is too granular to capture all the information you need about your work and its history. You need some kind of grouping mechanism as well. It doesn't have to be merge commits, necessarily. But as I understood it, your original argument was that "there is no need for anything higher level than a commit." But there clearly is imo.
That's because, like GitHub itself, the documentation for them is not in the commit messages but in the pull-requests. Look at their PR page (https://github.com/pypa/pipenv/pulls) and individual PRs, you'll see that it's much better organized.
The difference is that now the source of truth is not git itself, but wherever the pull-requests are.
That really ruins a lot of tooling. Going back through history to see when a bug might have been introduced; doing git blame to find out why a line of code is like it is...
If you write his commit messages, you also get good PR messages for free.
I personally don't, that was just a way to explain the whys. But this whole thing is probably a good reason to ask whether staying in git, the completely decentralized VCS, to work in a centralized manner, is the best way to go. As the sibling comment evocated, our traditional tooling is limited in that way of working but that doesn't mean it's inherently bad; maybe we need new tools?
When I clicked on that link, I was totally prepared to say, "Well that's not that bad." But wow, that's bad. If I were looking at getting involved, that could be enough to stop me. And all those broken builds might be enough to keep me from using the project at all.
>This is also the primary reason why I highly dislike merge commits: They make git logs extremely unreadable.
I am new to git. You will have to merge eventually with somebody when you are collaborating. Can you really avoid merge commits?
I don't like how `git reflog` keeps track of which hash did I switch to. I only jump to hashes for a quick run. I don't want it actually tracking that in the log. Does anybody know how to avoid that as well?
The trick I've developed to improve my commit messages is to ask my self how would I explain something to do what the commit does, in a widest sense. Before that my messages either amounted to "HAAAAANDS" (https://xkcd.com/1296/ -- there is always an appropriate XKCD), or tended to go into minutiae of implementation ("added a boolean attribute 'foo' and changed method 'bar' to raise FooBarBazException when the calculated value is 42"). The "instructions" approach helps me focus on why something was done ("implement verification of the IMO field") than how (which should generally be obvious from the code itself).
My commits are the exact opposite of that comic. My initial commits are unfocused since it's the start of the project and there's a lot going on. As the project stabilizes and changes are smaller and on point, the commit messages become more focused as well.
I prefer to avoid "intermediary" WIP commits at all. Of course, I still make such commits to avoid losing my work, but I continuously amend the first commit rather then creating new ones (unless I'm experimenting, which goes to a separate branch anyway). Multiple commits usually lead to a rebase before merging, which tends to be more difficult than amending.
I use the following convention to start the subject of commit(posted by someone in a similar HN thread):
Add = Create a capability e.g. feature, test, dependency.
Cut = Remove a capability e.g. feature, test, dependency.
Fix = Fix an issue e.g. bug, typo, accident, misstatement.
Bump = Increase the version of something e.g. dependency.
Make = Change the build process, or tooling, or infra.
Start = Begin doing something; e.g. create a feature flag.
Stop = End doing something; e.g. remove a feature flag.
Refactor = A code change that MUST be just a refactoring.
Reformat = Refactor of formatting, e.g. omit whitespace.
Optimize = Refactor of performance, e.g. speed up code.
Document = Refactor of documentation, e.g. help files.
These sorts of prescriptions always strike me as the sort of fastidiousness that some software developer types are stereotyped with.
Just once I would like to read someone that takes a descriptivist approach to commit messages instead of a prescriptivist approach. I would prefer even a scientific approach where someone sets out to measure if these sorts of measures have a concrete measurable effect beyond people's anecdotal preferences.
Prescribing what a commit message can look like implicitly prescribes what a commit can look like and that turns a flexible tool into a less flexible one. For many people, lack of flexibility can be a feature... but it is an empirical question on whether or not if it aids development and I am not really aware of anyone trying to measure these things. In the spirit of "No silver bullet," by Fred Brooks, I am skeptical that between the code comments, the code documentation, and the ticketing system that the git commits are adding much.
Best comment I read so far on this. Taken a stage further, a well written codebase of self-documenting tdd’d clean code and commit messages become moderately useless. For all the effort they take and the rare occasion they aid in finding something useful, a completely blank entry for every commit is arguably more efficient. In the spirit of it only taking a few seconds whilst your head is in that space, a short brain dump of what it is in any format you like is an excellent and efficient choice.
There does seem to be a strong lure to the ease of cargo-culting over being thoughtful. I guess I am always surprised that it is such a problem in software development because we pride ourselves on using our minds to solve problems.
My take: these aren’t good commit messages. The verb isn’t supposed to be what you did, otherwise it would always be “add” or “change” or “fix”. The verb is supposed to be what the program does thanks to this change. E.g. “Check server fingerprint”, not “Add server fingerprint check”.
We already know you changed or added something, it’s a git commit.
But that's not how git itself makes commit messages: (`Merge branch 'foo'`, `Revert 'some thing'`)
Not every commit changes how the code works. Some is just documentation-related, or formatting, or some other configuration change. The list goes on. Your approach would only make sense in a subset of cases.
IMO the message should be what the commit does, not what the code does.
Git doesn't know enough about the code to make more meaningful messages, so maybe git messages should not be the standard that developer messages strive to meet.
But how would you name the commits that fix, refactor and remove that same feature? "Check server fingerprint with less errors", "Check server fingerprint in a different way" and "Check server fingerprint no more"?
“Handle timeout errors when checking server fingerprint”, “Use new API endpoint for server fingerprint”, and “Replace server fingerprint checking with magic”.
You can still use those verbs, but the interesting thing in commit messages is what they do, rather than what changes you made.
A commit message is really like a small note to future contributors; it’s not always easy to write them, but it’s always worth thinking about them from the perspective of someone who is looking at them two or three years hence.
But "Replace server fingerprint checking with magic" has clearly crossed the line from "the verb describes what the code does" to "the verb describes what the developer did".
"Note to future contributors" is spot on of course. The best way to write better commit messages is to consume the existing corpus as often as possible, e.g. by never trying to understand code without the blame column active in your editor of choice.
My preferred format is a condensed why-what, consisting of "$verb $ticket $wherein":
$verb would be the developer activity, like fix/optimize/remove/.., and it is mostly there to make it clear that the rest of the message should not describe developer activity.
$ticket would reference your beloved issue tracking system (an additional short keyword describing the issue doesn't hurt as a checksum and to prevent excessive referencing, but it has to be optional because a bad keyword is worse than none).
$wherein would be the customary quick rundown of how the code is supposed to work, expect future readers to only read the beginning.
I like this order because it gives a rudimentary sentence structure to the formulaic parts and positions them at the beginning where they can never be pushed below the fold by the potentially rambly description of the code. Anything more complicated than that will degrade harder when rules are not followed to the letter. Perfect is the enemy of good.
In my case I have commits that say "check server fingerprint" then "check server fingerprint but this time it works in production" and then "check server fingerprint but this time it ACTUALLY works in production" followed by "check server fingerprint works in dev and test what is actually happening here" and the final commit of "i hate everything about all of this". Of course that last unhelpful commit message is tied to the code that actually works in production so it's what sticks.
The good news is that you apparently fixed it already at "I hate everything", which is still a number of commits before trying to recite the Macarena from memory.
A good rule/guideline for commit messages should not only help us writing perfect messages when we would otherwise write merely good ones, but also scale down to encourage mediocre messages when we completely stopped caring. "rerere ISSUE-123 there is no hope"
I'm disappointed that the mods changed the title from "Write good commit message" (which is the actual title of the article) to "Write good commit messages," because I think this was an intentional, subtle joke by the author to make the title resemble a commit message.
Yeah, it's the same advice, just Chris Beams did it over 4 years earlier.
This is the article I always link people to on my team when explaining how to write commit messages. My company doesn't have a style guide for this exact thing, so this article is what I've been using. And the results are here: https://github.com/google/nomulus/commits/master
Good commit message summaries are important (the short one-line overview).
Including links to the issues the commit relates to in the body is important.
That said, most of what I've seen in large commit message bodies (like the ones I used to write) really belongs in the patch itself, as changes to the project's formal documentation or source code comments.
If you thought it was worth explaining why you made the change as you did, it likely means the choice was not obvious.
If it wasn't obvious, the explanation belongs in the project proper, where anyone who cares can see it, not tucked away in a commit message that may be hard to discover four or five years down the road.
If it's a decision that impacts UI, the justification belongs in the project specs or docs, where people besides devs can find it (I like to keep my specs and docs as plaintext in the repo and render them to HTML for non-devs to reference).
If it's a strictly internal decision, like what algorithm you chose for a function's internals, you should explain why right there instead of hiding valuable knowledge in the commit message.
> If it wasn't obvious, the explanation belongs in the project proper, where anyone who cares can see it
A lot of times, code comments may not be updated along with code changes, so they may not be accurate. A commit message is associated with a change at the time it was made.
> not tucked away in a commit message that may be hard to discover four or five years down the road.
The git blame command makes it pretty easy to see what commit introduced a line of code and it also makes it easy to see the context of the change (the rest of the diff).
> If it's a decision that impacts UI, the justification belongs in the project specs or docs, where people besides devs can find it
There's no reason that it can't be recorded in both places.
> If it's a strictly internal decision, like what algorithm you chose for a function's internals, you should explain why right there instead of hiding valuable knowledge in the commit message.
But let's say you want to make a change to the method and you have a comment explaining the change there. Now you make a change to the method and some other part of the code breaks. If you looked at the commit message instead, you can get an explanation and the context of the change (meaning the other parts of the code that relied on the original change you're looking at).
> A lot of times, code comments may not be updated along with code changes, so they may not be accurate. A commit message is associated with a change at the time it was made.
Code review ought to catch comments that haven't been updated.
Further, if a change would have caused the comment to become stale and irrelevant to the project's current state, the commit message would have the same problem. Where you keep it doesn't impact that.
If you want to look at historical states, the comment itself is saved perfectly in the old commit, just like the commit message would be.
> The git blame command makes it pretty easy to see what commit introduced a line of code and it also makes it easy to see the context of the change (the rest of the diff).
I am intimately familiar with `git blame`, and have used it for code archaeology in puzzling codebases over a decade old.
It sure beats having nothing, especially when you configure it to ignore whitespace changes and use Magit's lovely blame interface to move through history quickly, but it can still be a pain to figure out where code really originated from.
If the originating commit gives me a link back to the issue that started it and the code's well documented, I don't need more verbosity.
> There's no reason that it can't be recorded in both places.
There's no reason it can't be recorded in fifty places.
That doesn't make doing so a good idea.
> If you looked at the commit message instead, you can get an explanation and the context of the change (meaning the other parts of the code that relied on the original change you're looking at).
You are describing looking at a commit, not the message. By definition the code changes are not part of the commit message.
At the end of the day, what I described in my comment is something I arrived at after years of writing verbose messages and slowly realizing it wasn't the best way.
You are, of course, free to disagree. Do what works for you.
> Code review ought to catch comments that haven't been updated.
There are a lot of people who just look at the diff and not the rest of the code when reviewing. If the comment doesn't appear in the context lines, they may not catch it.
> if a change would have caused the comment to become stale and irrelevant to the project's current state, the commit message would have the same problem.
Not really. Unlike a comment that can be seen with the current code base, a commit with an outdated message tends to show up in very few lines (if any) in the git blame output for a particular file. For example, in a code base I deal with, the first commit for a particular file where the message explained the rational and some implementation details now only shows up in git blame output for certain blank lines in the file since most of the file has changed in the years since that commit was made.
> but it can still be a pain to figure out where code really originated from.
You may want to look into the -S and -G parameters of the git log command. They can be used to see when some text was added, removed or moved.
> If the originating commit gives me a link back to the issue that started
Until you encounter the situation where those links are useless because the system they linked to was migrated to a new platform. Had the actual text been there, then it still would have been useful.
> the code's well documented
In my experience, code comments rarely explain why a change was made. But if the associated commit message does contain that explanation, then it makes it much easier to see the context of the change.
> There's no reason it can't be recorded in fifty places.
>
> That doesn't make doing so a good idea.
That also means you don't have to look in multiple places to find the information you need. The further the documentation is removed from the code, the more likely parts of it will be inaccurate due to changes in the code base, so if you only record documentation in a contract or wiki, then it's very likely that contract/wiki may not be completely accurate.
> You are describing looking at a commit, not the message. By definition the code changes are not part of the commit message.
You can see both by running git show sha1_from_git_blame.
> At the end of the day, what I described in my comment is something I arrived at after years of writing verbose messages and slowly realizing it wasn't the best way.
I've spent years encouraging people to write verbose commit messages for changes they made. I've found them very useful (especially in cases where the person who wrote them no longer works for the company and they're no longer around to ask for further clarification). Whenever I come across a commit message that doesn't explain why a change was made and have to ask the person who made it what they were thinking, I invariably think that it would have been much better if their explanation was in the commit message in the first place.
To clarify, I'm not arguing against commit message bodies entirely. They're valuable and I write them regularly. Not for every change, but lots - maybe sixty - eighty percent.
I'm just saying that huge, multi-paragraph essays are often a sign you're putting information in the wrong place.
I should also add I'm a big believer in small, focused commits - I fairly often will have a branch that has a few hundred lines changed and twenty commits.
You make a good point about the issue tracker dying - I've worried about that but have yet to encounter it in practice. Reducing the impact of such an event is probably a good reason to denormalize a little there.
Meh. This post is not harmful, but focuses on syntax and fails to insist on the most important thing: tell WHY you made the change; the what and how are already apparent in your code.
This is exactly right. It's easy enough to see WHAT happened by looking at the code. Of course you can document the WHY in code also, but that usually ends up in a comment that gets out of date. Whether you're trying to figure out why the shit hit the fan or understanding a new codebase: commit messages that explain reasoning are gold.
I still follow my past team definition of a good commit : the commit message should answer in it's first line why this commit was done. Overtime I realized that when I git blame I never care what a commit does because this is always obvious by reading the code of the commit. The reason why, however, is most of the time not obvious at all.
One thing I always want to tell people about commit messages:
There’s no length limit.
In fact, write as much as you can. Go crazy. Write some more. Explain. Talk about how your day went. Tell us how you found the problem. Put benchmarks that show why your change makes things faster. Show the stack trace or test output that you’re fixing. Quote other people. Put an email chain in the commit message.
In a well-curated commit history, commit messages become source-level documentation available via an annotate/blame operation. Most people hate writing documentation, but commit messages are about the only time when our tools force us to write something. Take this opportunity to really write something. It’s the only time when writing is really required in any way. There’s no need for a length limit because most of the time the commit messages are hidden away, and most interfaces will hide the full commit message anyway (or can easily be configured to do so).
If you want practice jamming lots of information into a small amount of space, that’s what the first line of the commit message is, but after that, don’t feel constrained by length limits.
These are examples of my favourite kind of commits:
From your first link: I think that comment should be placed as a source code comment near the source code: "preferedchunksize = 32768". I think magic numbers should be described at the point they are defined, or at least put a link (in a source code comment) to why the music number was chosen....
When you're on a codebase where people bother writing commit messages, reaching for blame/annotate output becomes second nature and reveals so much about your code. It's just as good if not better than comments, because every line in your source now has a comment.
In fact, every line of your source has several comments! Which may or may not really apply!
BOTH are needed IMHO, comments must be able to communicate knowledge a reader of the code-as-a-whole needs to know, and commit messages must be able to communicate knowledge a reviewer of the change needs to know.
Question: I use past tense instead of present tense because I explain what changed. But I see a lot of commits written in present tense. Is one better than the other? Which one?
You'll get plenty of devs who's make an impassioned argument that present tense reads better (and they'd be right to an extent) but honestly it's really more of an OCD thing than anything. The important thing is that messages are detailed enough to be accurate but terse enough to be eyeballed quickly (if you need more detail then include that after the first line summary); and that you include reference numbers if you're commit is in relation to a ticket (eg JIRA, Github Issues, etc).
Some people add tags, emoticons and other stuff. But the real key is consistency. Pick a format and stick with it
If the key is consistency, use present tense. If you consistently use past tense, that works until your team merges with another one that's more conventional.
The commit message is the headline of your story. Headlines are in present tense even though they describe past events.
Honestly, out of all the things people argue about this has always struck me the strangest. It's the part that makes the least real world difference (vs not including ticket numbers - for example) yet it seems to be the thing people get the most impassioned about.
Consistency is important, but if you're expending more energy arguing about it than you would deciphering past tense message then you have serious questions to ask yourself. :P
I tend to use present tense as I'm describing the change that is to be applied to the code by the commit. It often also is terser than past tense by a few chars, sometimes just enough to fit in the commit title (Add vs Added).
Using the present tense makes sense as that's the context of the change and it makes it easy to reference what it used to do or what it's set up to do into the future by using other tenses.
This is something I put significant effort into. Same thing for test names. Most importantly, I explain why I did something. A test name "testFilterSpecialCharacters" adds no documentation. "testFilterSpecialCharactersBecauseSpecialCharactersCrashTextBoxLibV1_3" is way better. It adds documentation and context that don't exist in the lines of code in the test. I also added the exact version of the library that motivated the filtering for convince. Don't be afraid of long commit messages, test names, or varibale names.
I've been using the commitizen prefixes (feat/fix/docs/style/refactor/perf/test/chore) but I've noticed that they make messages lengthier and a little more difficult to understand.
I really wish git would have a built-in category system so that I can automatically generate changelog headings (features, tests, etc...) without sacrificing the legibility of commit messages. Git clients could display this information alongside the message (think of how Github displays the short commit hash).
I have always been extremely peculiar about commit messages and IMO the commitizen stuff (aka "semantic commits") is one of those things which is only useful if you personally find it useful and you are the only one working on your project.
I have never seen it work in a team of 2+. People mix up what each prefix means. Hell, I've seen someone "use" it but only ever use "feat:" even for typo fixes. I ended up rewriting his entire git history to strip all "feat:" instances from the commit message since they were just noise.
Basically the only time I've seen it work is when all the following is true:
- You're on a solo project
- You commit a lot
- You're very consistent with your prefixing
- You want to use those prefixes to generate changelogs.
I disagree, we've been using it with friends of mine and I enjoy it. The main reason is so that's it's easier to generate a changelog as we're following the Angular commit guideline.
I would consider Angular a successful project and this is their git history: https://github.com/angular/angular/commits/master
I mean, I've seen it used by larger teams as well. For example Sentry uses it. I've also had feedback from several devs using it in such teams that the whole thing was "bureaucratic bullshit".
That said if you and your friends are happy users of it, I'm glad. I suspect that if you have a small team that knows each other well enough, that is a bit of an extension of a one-person project and it can still remain. I personally have seen the system crumble enough times with even one single user, not to trust it in the hands of two at once.
> I suspect that if you have a small team that knows each other well enough, that is a bit of an extension of a one-person project and it can still remain.
Well yeah that's exactly this. Tough I've got to admit that it doesn't work that well as we'd like regarding external pull requests since contributors do not all read our commit guidelines.
However, the examples fail by emphasizing what was changed, rather than why it was changed. Sometimes those are the same, and that's ok. But when they're not, why is overwhelmingly more important. What can be seen from looking at the diff.
These days, why is most usefully a PR number, and your workflow automation turns it into a URL you can click. I.e., why did you make a change not motivated by a PR?
The original project had generic commits .. harsh. Mine are just a tad too cryptic. It's so damn easy to be fooled by everything you have in mind when coding..
On GitHub you can also auto-close issues using certain keywords. I use "Fixes #123" for bug fixes and "Resolves #456" when completing features. The nice thing is that it will link to the issue in the GitHub commit history.
This seems to have become really pervasive the last few years, and I recognise it’s sometimes driven by regulatory/certification requirements. But in the absence of those, what does this gain you over putting a sentence or three of human-readable motivation into the commit message?
All the context from the issue that doesn't fit in a sentence or three?. Of course it depends, if the issue doesn't have context it's not important, but things like "who requested/reported this", "who was involved in decisions made", "what alternatives were considered and why where they rejected" might not necessarily fit in the commit message but exist in an issue.
If there's stuff that's valuable, I'd prefer to see it pasted into the commit message -- far less likely for links to get broken.
If it's long term valuable, i.e. "I considered obvious, attractive alternative algorithm X but it failed horribly because of Y", I'd prefer that to go in a comment in the code instead. Far more likely to still be noticed in three years time, after the code's been run through two different auto-formatters and otherwise mangled around.
So when you look at the Jira story you know where it is.
When you deploy some code, how do you know what you actually deployed?
You could look at the competed stories, but it's not that solid as checking merges with that story number
If you use Pivotal Tracker you can enable GitHub integration and either use the story id as the start of the branch name, eg, 12345-fix-foo, or reference it using [#12345] in the commit message to enable PT to add pushes and PRs to the story activity list.
We started using commitizen at my workplace. It really seems to push the idea of good commit logs. Combined with jira ID hooks to keep people from pushing without a story/ticket.
I always appreciate commit messages that are informative and well structured. That said, I'm not a fan of specific formats for commits. The commit messages in a project are where a lot of that project's collective "personality" is stored. If you look through the commits for a project with a "colorful" variety of messages, you'll get a sense not only of the work that was done, but of the people who helped create the project.
Somewhat related: @git_commit_m on twitter has some great (and amusing!) examples of what not to use for commit messages, which are pulled from github's public data set.
The key to write a good commit message is to write it for others, not for you. Many time I see commit message is written for the author itself who has all the context around it.
The Go programming language has very good commit messages
For me, an ideal git commit message is one that also links to a bug or feature ticket. Both {Github+Github Issues} and {BitBucket + Jira} support this almost seamlessly, I imagine other systems do as well.
I agree here. OP's article mentions Chris Beam's page [1] and I have all our developers follow that structure which helps but ultimately the biggest strength is the final line with the link to our JIRA.
There's this advice I read in a similar article that, I think, makes a lot of sense and is ignored in 99 cases out of 100 – put a period in the end of a commit message so a reader can clearly see that nothing is broken or corrupted and what they see is the entire message in its full.
Each of the few times I would mention this requirement on a project people would look at me like I'm a retarded child and keep not using the period.
There are some tools that basically don't support this. GitLab is the biggest offender:
1) default view is all commits in a MR squashed together
2) messages beyond the title hidden under a click on some micro expand icon
3) going through a MR commit by commit means clicking on an individual commit, waiting for the damn thing to load it's view for 10 seconds, then when you're done you have to go backwards and do it all again
We use the same settings in Github of squashing PR commits. I guess it's a matter of preference but IMO this workflow is a better version of what the article describes.
You get the best of both worlds, you end up with a very readable history of commit messages, where each one describes a single feature or unit. But while you're working you don't have to break your flow to write documentation every time you commit, or go back later to rearrange/rebase commits.
My favourite anecdote about this is an old coworker of mine who, at a certain point in time, had a number of successive commits with the commit message being his first name.
I'm a big fan of outsourcing this sort of discipline to tools -- check out [Komet](https://github.com/zorgiepoo/Komet) for a commit-specific text editor that makes it easy to write better commit messages.
I would recommend to also look at this guideline + tooling to create an automatic changelog based on your commits: https://www.conventionalcommits.org/
I always force myself to write good commit message.
But I often ask myself if it is really useful or if I'm just too tight and want things to look good.
When you need an overview what happened in the last week, do you prefer to always restrict yourself to scanning 10k lines of diff (which may be missing relevant context)? Wouldn't a screenful of short commit log be helpful?
I'm obsessed with writing good commit messages, for a few reasons.
(1) Documentation is important, and comes in a few major forms: docs, code comments, commits, and tests. Docs and code comments are good for initial, high level understanding. Most "bad" documentation appears in the form of code comments and external docs, because they are most likely to drift out of date with the code. Tests do not have this problem (assuming they all pass) because they are in sync with the code, and provide a nice way to understand interfaces and implementations (the "what"). Similarly, commit messages, by their very nature, cannot drift out of date with the code, and provide an opportunity to document the "why". Therefore, commit messages are at least as important as tests, comments, and docs and should be treated with the same respect.
(2) Often the "why" of a particular implementation touches multiple files around the codebase; in many cases you want to document the "why" in comments, but that only helps when it applies to a single section of code. A commit message is an opportunity to document the "why" of an implementation that touches multiple parts of a codebase.
(3) Writing good commit messages forces you to keep the code contents of a commit tightly related to its message, lest the message be inaccurate.
(4) Because good commits are groups of closely related files, you can understand the subtle interactions of a codebase by looking at which files change in the same commit.
(5) A good commit log tells a story and can often provide reasoning behind what may seem like the madness of a legacy codebase. If you don't understand why a file does what it does, just search the history for all commits to that file and you will have a much better idea. There is a cool tool called Gource [0] which visualizes commits to a git repo in a way that can tell such a "story."
Some of the rules I follow:
(A) A short message with an imperative mood documents the "what". A longer body, in list form, documents the "why" and/or the "how." Always include the body unless its a tiny commit with an obvious why/how.
(B) A pull request should follow the same idea as commits, in that it documents the how/why. It should also document the "how" of using any new features it introduces. If possible, it should include screenshots/videos/links of the changes so QA engineers / managers can quickly read it for high level expectations of the next release. Other commenters in this thread have mentioned that pull requests are where the documenting should happen. But good pull requests are just as important as good commit messages; they are not mutually exclusive. A pull request is just a higher level commit.
(C) Before submitting a pull request, use `git rebase -i` against the development/master branch to squash, reorder and fixup commits. For example, sometimes one commit is "solve problem A using method X," but you change your mind in a subsequent commit "actually, solve problem A using method Y". In that case, the two commits should be squashed together if they are in the same pull request. In general, do not be afraid to aggressively reorder and regroup commits in a pull request, if it improves clarity of the pull request as a whole. For this, I like to use a tool called rebase-editor [1] that makes interactive rebasing easy and satisfying.
You shouldn't be merging broken/exploratory stuff to master. Rebase or rewrite the commit so it's correct, self-contained, and others can understand it.
What's the point of adding a sentence or two to a commit message? I only include a keyword and the issue number, e.g. "Closed #41" or "Fixed #41". This will link the commit to the GitHub issue, which will allow any users to see all the details they need about what was involved in the commit message.
That’s not particularly helpful if you’re looking back through git logs or at a git blame. Unless you’ve memorized the context of every github issue you have.
Also what if you move from github to bitbucket? Or self hosted git?
Git provides a great way to keep your changelogs directly next to the code, why not use it?
It's drastically less helpful if anyone ever wants to look at the code separate from whatever issue-tracker you're currently using. Are you confident those IDs will be preserved (in a useful form) if you ever want to migrate away from Github?
This assumes that you have access to the issues page. That's not always the case (when working as a contractor for instance). Also, tickets are not always filled properly, and often changes are not accompanied by a ticket. You get so many bonuses from writing good commit messages:
-> Overview of the latest changes without having to open additional links. Very handy for git bisects for instance.
-> Automatic generation of change logs.
-> It enforces changes that are purposeful and well scoped.
-> Knowledge stays with the Git repo. No data migration issues if you decide not to use GitHub anymore for instance...
-> And finally... it really just looks nice and professional. I would not take a project seriously if the Git log was just an aggregation of WIPs and Oopsies.
One of the many horrible things about git is that it barks at an empty commit message.
Of all the documentation that whatever you are working is lacking; you want to have people writing commit messages.
Seriously; no-one reads this stream of incoherent babble. If you want more documentation start somewhere else, somewhere where your documentation effort actually would be useful.
I'd say _add_ issue tracker reference, not replace the description with it
43ec6aa Fix error when the URL is not reachable (#1001)
4fe84ab Add error message if something went wrong (#1002)
753aa05 Add server fingerprint check (#1003)
df3a662 Fix shadow box closing problem (#1004)
Issue trackers like JIRA can be configured to search for commit-Ids in the messages, which will give you all commits associated with a ticket.
Adding ticket IDs to commits has fallen a bit out of fashion since feature-branches and pull request workflows have become popular. But adding the ticket ID to the commit messages allows for trunk-based development style, while not losing the ability to make a code review over the commits referencing a ticket.
Most of the time I write the ticket ID in the text below, which keeps the `git log --oneline` pretty, the tools can still pick it up.
Me too, but I put a full hyperlink to the github/gitlab issue in the body instead of just the number so that someone can just click and start reading what it's about. It also creates a back-reference on the issue page in some web-based issue trackers like Github.
43ec6aa [1001] Fix error when the URL is not reachable
4fe84ab [1002] Add error message if something went wrong
753aa05 [1003] Add server fingerprint check
df3a662 [1004] Fix shadow box closing problem
That makes the corresponding issue easier to find.
I don't really agree with this approach, you should use either tags or branch names to link issues with git, commit titles are not the right place. It's very common for an issue to span multiple commits, this solution feels insufficient. If you feel it's necessary, you can add the issue number to the commit message body, but as a title it's not that useful. Humanly readable titles are the best option, if I need to find something I will be using git log or grep anyway.
I think the opposite tends to be more helpful. The title should give you as much information as possible at a glance since it's likely to be seen in a list of commits or as part of the history of a file or line of code.
The more verbose details and ticket id should be in the body since I'm more likely to care about them once I've pulled up the commit to review in detail.
Not to belabor the point, but once you're familiar with the contents of a given ticket (the "{Jira-1234}" reference in my example), the commit message titles' information density goes way, way up. Adding this ~12-character prefix results in useful, scannable, one-line commit messages. I see no downside.
The git log is one of the main entry points to an open source (or closed source, in fact) project. Following them tells a story, and can help you understand decisions made.
Imagine you have two projects to bisect. One of them has clean, atomic and descriptive commits. The other has a git log that looks like this: https://github.com/pypa/pipenv/commits/master?after=d4d54eeb...
I know which one I'd want to debug. This is also the primary reason why I highly dislike merge commits: They make git logs extremely unreadable.
The thing is that writing clean commits is something that is extremely easy to do. Unlike docs and tests, it's not more work (it's less, in fact) and it's not something to continuously maintain as the code changes. It's an immediate improvement you can make to your development habits that will have a consequential improvement to your QOL. You will catch bugs doing this!
(Tip when writing atomic commits: Use `git add -p`. That lets you stage patch hunks.)