`squash` is a tool and it's neither good nor bad; it needs to be applied where it makes sense.
The title is indeed click-baity. It would have been more interesting to read that the bug/mistake was caused as a result of squashing. It's not the case and I take issue with the way the author describes his commits.
The problematic commit is described "Extract CreateTokenValidationParameters method", without an explanation of why the refactoring is necessary or what problem is it solving. It looks like it is improving code readability, but it doesn't go as far as fixing the more glaring issue with global variables. Other commit messages seem to follow the same pattern.
In other words, the commit:
- has minor code readability improvements
- contains no useful message for the future programmer (why is it needed?)
- provides no new functional feature/improvement/business value
- is later found to contain a bug
I find squash/rebase/cherry-pick useful when reviewing my work and deciding what should go in the current pull-request. For example, a refactoring might be postponed for later if it is deemed too time-consuming or irrelevant to the current PR. Or, one can squash logically related commits together, add a useful message and merge them separately. The resulting commit log will still be bisect-able.
For beginners: there's a very good article with tips for Git Commit Messages[0] that helped me have git histories I enjoy reading.
I think you're missing out on the main advantage the article points out: lots of small commits, even with terrible error messages, let you use tools like git bisect to find bugs. Squashed commits mean you're looking through more code, and above some small size, it won't be obvious what the issue is.
If what you're looking for is a specific breakage or result that doesn't depend on something "working", then each commit does not necessarily need to be working. You may also want to read about the git-bisect feature "old/new" which could help if you're just looking for a change and not a breakage.
More often than not, lots of small commits, with incomplete changes, will make git bisect completely useless.
It takes someone that's, at the same time, making 'messy' commits with horrible messages, but also diligent enough to never commit any breaking changes.
Not really.
With git bisect you know what you are looking for.
So even if you did make a commit with e.g., a missing semicolon you can still call it "good" because it did not cause the problem your are trying to debug.
Unless there's a way to achieve the same thing without global variables. In any case, the decision to use them can be documented in a comment or a commit message.
Seems like the real bug was in his handling of JwtSecurityTokenHandler.
He claims to be an expert on dependency injection with two decades of automated testing experience. I wonder what was so hard about writing a test to cover this scenario?
It's not a good look to trash someone's career because they made a mistake. This happens to everyone everywhere — the only question is how well you handle it.
Point is that `squash` is a useful tool used by many other successful professionals. I'm allowed to disagree with the author's opinion of the root cause of his bug and the factors that made it hard or easy to debug.
Nobody is saying you can’t disagree. My point was just that you should focus on the topic rather than attacking someone you aren’t familiar with. Doing so was a distraction which didn’t make your argument stronger.
Seems like a form of thought policing which didn't apply to yourself. Hyperbole that I trashed his career and "not a good look" also appears like an off topic attack.
The reason to squash commits is more than just keeping your commit history read-able, it's about making easy to revert a feature and being able to keep history in a way that makes it simple to revert a change if you run into issues.
If I rollout a rewrite of an endpoint and run into a weird issue in the QA environment, I'm a simple git revert away from fixing the issue. If I had spread that endpoint across 25 commits, I'd have to actually debug the issue in QA and figure out what I broke. Reverting quickly lets me debug the issue on my time instead of keeping our test suite broken.
It's not always feasible, and I try not to be a stickler when people on my team don't do it but the fact is, if you're working in a world when you're delivery code quickly into real environments, having a back-out strategy is paramount.
The problem with this approach is the local branch is no longer represented in the shared branch. So if I'm working on a larger feature and want to PR an intermediate part and continue working, I'm in for a bad merge.
If I want to merge my hotfix topic branch into both the release and the master branch, their commits won't match so I can't check if it's present in both automatically.
If a topic branch is left up instead of deleted after a squash merge, I can't even see that master is ahead of it!
Squash is an ugly hack that creates as many problems as it solves.
I desperately wish git had a "group commits" feature that let me manage a cluster of related commits as a single commit for the purposes of history-viewing, reverting, and cherry-picking.
> I desperately wish git had a "group commits" feature that let me manage a cluster of related commits as a single commit for the purposes of history-viewing, reverting, and cherry-picking.
Merge commits work fine for most of that, you just have to adapt to merge UX/commands:
--first-parent (to git log, git annotate, etc) gives you clean history viewing of just your "groups" (your merge commits).
--first-parent even works for bisect allowing you start by figuring out which merge commit brought in a change (and then dig into the merge branch itself if needed as a second bisect).
You can revert or cherry pick merges if you provide the -m (mainline) flag to tell it which parent to consider the mainline (usually the first parent, but not always depending on your intended revert/cherry-pick; it complicates what you need to know about the revert/cherry-pick, but if you are in the process of revert/cherry-picking you should already be figuring out what your mainline is and expecting some possible complications).
I think sometimes the only "problem" with Merge commits is too few pretty UX tools default to a --first-parent view of the git graph and don't themselves provide good tool for picking that -m (mainline) for revert/cherry-pick.
I think it is an interesting tools problem: making interesting graph diagrams of the full git log graph is a fun and interesting problem and often looks great in screenshots, but rarely is a particularly "useful" view to most users. It's not as "fun", doesn't produce as many shiny/colorful screenshots, to build --first-parent views by default and introduce (and test) graph drill-down based user experiences.
> I desperately wish git had a "group commits" feature that let me manage a cluster of related commits as a single commit for the purposes of history-viewing, reverting, and cherry-picking.
Maybe you can do something similar to that with git-replace?
Otherwise, you could just tag your commits by putting something in the commit message and use `git grep` to search for those commits. Then you could just build a small helper script that 1. greps for a tag 2. loops over the found commits 3. rewinds them or whatever and 4. squashes the resulting commits.
Dunno, there's a bunch of other ways depending on how exactly you want it. But I agree that grouping commits would be pretty cool :D
But doesn't that assume that you based your larger feature on the local branch, instead of the squashed public version? If you wanted to build on the previous commit, why wouldn't you build on the squashed version?
> If you wanted to build on the previous commit, why wouldn't you build on the squashed version?
alexmingoia proposed:
> That way one gets clean shared history while preserving local work history.
But if you keep on building upon the squashed versions each time you do a merge, you won't have convenient access to the local work history any more.
If you wanted to have access to the local work history you'd need to keep each branch alive still, each time based on the squashed history plus your individual commits up until the next squash.
I suppose you could cherry-pick the new branch onto the squashed version, but I'm still not sure of the value.
You have the local work history already, and that work is done to the point of it being squashed and pushed. Why would you need to keep referencing it to the point that it's inconvenient to have it in another branch?
> you'd need to keep each branch alive still, each time based on the squashed history plus your individual commits up until the next squash.
exactly what I do, and there are maybe just three or four branches I maintain for reference, very few squashed branches are useful a couple of weeks after they are released.
> I desperately wish git had a "group commits" feature that let me manage a cluster of related commits as a single commit for the purposes of history-viewing, reverting, and cherry-picking.
Isn't this what git does by default when you merge your changes? The merge commits group small commits together.
> if I'm working on a larger feature and want to PR an intermediate part and continue working, I'm in for a bad merge.
I usually deal with this by `git rebase`ing your feature branch on top of the shared branch as soon as the PR is merged. You sometimes still get merge conflicts with this approach, but they're always in the code you've just written so they're usually pretty easy to fix.
The problem with rebasing is you might break every one of those commits, which defeats the purpose of chunking the work like that in the first place (since its no longer an honest reflection of what was happening/what worked at each point).
I usually rebase and then squash so that becomes a non-issue. I often find I want to commit more often than the code is in a working state, so I like to be able to erase that history later. I try to keep the whole branch small enough that it'a fine being in one commit.
I don’t really like squashing on merge, since it destroys the commits I may have made. If you’re getting a PR from me and it has more than one commit, I have done that on purpose and you want to keep those separate.
My gripe with merge commits is they don't integrate nicely with `git blame`. If I'm looking through historic commits (to understand why a change was made, or perhaps to debug an issue) I'll often `git blame` the line and diff that commit. If the commit is super granular, I can't get the context of the whole change: I need to dig for the merge commit then look at that, which is faff.
If there's a way that I don't know to show merge commits in blame rather than the actual source change commit, then I'd be all over it. Until then, single (whole) units of change per commit.
1. The large majority of PRs I've reviewed have a single contributor. Additional contributors are rare. When they do happen, they're often a minority contributor or simply consulting on a PR. It's net neutral when all PRs are squashed in the same pattern.
2. Even with multiple contributors, most features have one leader. It's much easier to talk to that person (and have them delegate) than it is to piece together multiple contributions.
That's an interesting point, but it seems like one among many very good arguments for "right-sized" or "logical" commits, i.e. not 10 similar-shaped bugfixes and also not one-line no-context diffs -- how big should the PR and merge commit be though? Maybe you put the 10 similar-shaped bugfixes together into one PR because they review together easily, but each fix is its own commit, because they all logically stand on their own.
Your use-case is actually the cause of a common rule I've seen at work of requiring a ticket reference in each commit message, which allows looking up the original ticket and associated PRs, along with any commentary & discussion at the time the commit was merged.
On a big code-archeological dig, I often follow a path like run blame -> look at the diff -> pull the ticket reference -> find ticket in issue tracker -> read its description & comments -> find linked PR #'s in the ticket tracker -> open PRs & read diffs and comments -> repeat for linked issues if needed (and then as often as not still end up baffled)
One team actually kept an old redmine VM instance running mostly based on my personal use long after we'd migrated to JIRA, so... I think my approach may be a little unusual! At the least, doing better sized commits would a huge step for every case involving blame.
I personally really dislike merge commits because it makes the tree really difficult to follow in most visualizations. If I'm trying to follow the main branch only to a certain point the graph is polluted with all the "WIP" side branch commits between head and the commit I end up getting to.
It also defaults to causing the main to have a ton of commits with "Merged from XXXX branch" as the summary lines when that's not nearly descriptive enough to quickly find what type of commit I may be looking for.
> If I'm trying to follow the main branch only to a certain point the graph is polluted with all the "WIP" side branch commits between head and the commit I end up getting to.
Using merge commits doesn't have to mean that you don't squash at all. At one extreme, you include every single commit ever made on the branch, and at the other extreme, you squash the entire branch down to a single commit. Using merge commits, you can go for any option inbetween.
> It also defaults to causing the main to have a ton of commits with "Merged from XXXX branch" as the summary lines when that's not nearly descriptive enough to quickly find what type of commit I may be looking for.
Do you often read the log linearly? 99% of the time when I investigate history in Git, it's either through "git blame" or through "git log -S somestring" (search for commits that introduced or removed "somestring"). I rarely, if ever, just read the log as-is.
I would equally argue there is nothing wrong with the local work history branches being remotely available. If I have multiple branches for one ticket nobody ever seems to care, they just care about which branch is in the PR. Besides, if you have a reasonable web UI for git, it shows it all merged together as one big changeset.
A merge commit works just like a squashed commit except it keeps all history. This is precisely why it's sensible to avoid fast-forwarding since that operation discards the fact that a branch existed in the first place.
It's better to always have merge commits. They can be reverted just as easily. I don't understand why merge commits aren't the default in git.
If you can fast forward the fact that the branch ever existed was irrelevant, since the branch is a direct child of what you based it on. Just with a different name.
It's not irrelevant. The branch represents a feature, a topic. It groups the commits you're merging into one logical set. This grouping of commits is exactly what will let you revert the feature later if it causes problems.
I've always been frustrated by losing my topic branches once they're merged and deleted, but can't bring myself to clutter my local branches keeping them around, to the point of tagging them just to keep track- I like the sound of your argument and will look into how fast-forward effect my commit history vs. a merge commit, thx c:
This is where you realize that what's killing git is that git has no concept of branches whatsoever. Such a merge isn't "merging A into B (plus shove metadata as string into the commit message)", it is "merging A and B together", which is topologically identical, but semantically very distinct.
That's why Mercurial (esp. with evolve and topics) has forever my preference over git.
I can't figure out how to explain to Junior Devs (who have only ever known git), that they have a concept of a branch in their head that doesn't match the concept their tool of choice is giving them.
We talk about "branches" as logical sets of changes. We give them meaningful names, we construct the concept of Pull Requests and code reviews around the concept of a branch. We later refer to Feature X as having landed in master from branch Y. But git doesn't have any of those semantics. It has lots of ways of dealing with commits, and a facade of a branching model is just one more way of dealing with commits. Branches are not a first-class concept in git.
And certainly not like they are in our minds.
However, git is amazing at what it does! And if I was running the world's most popular OS kernel development team and was expecting to receive hundreds of patches a day via email from developers in whom I have limited trust, I would definitely start with git's model and change the way my brain works to match its semantics.
Instead, I find myself on a small team of high-trust coworkers who all talk about branches as if they really exist in our git history, and somehow I'm the crazy one for pointing out that every time we hit a problem with this mismatch the fact that we're using git is the reason that we can't have nice things.
Indeed it is just a normal commit that has extra parent commit…
Git commit is always a standalone item that linked to parent commits. The actual content can be completely unrelated to parent commmits if you must do.(but what is the point of doing this?)
A merge commit is just a commit that has more than one parent.
> I'm a simple git revert away from fixing the issue.
I don’t think this is true /just/ for squashed commits as you can simply revert the merge. Squashing is also bad, because you’ll lose the history of the bug fix that prevented the deployment. Or at least someone will have a hell of a time deciphering the PR to reintegrate the original change and your bug fix.
After a few days, weeks, or months, the argument loses even more water because code will likely depend on the commit in question.
War story: there was a time someone accidentally deleted a multi-gb table in production. The table would take hours to delete and replicate globally, so the entire company spent at least an hour deleting the feature from production to stop the database errors. There wasn’t any reverting of original commits. No one had time for that.
> If I rollout a rewrite of an endpoint and run into a weird issue in the QA environment, I'm a simple git revert away from fixing the issue
I'm firmly of the belief that any benefits that squashing brings would be better achieved with better tooling, rather than by re-writing history and throwing away potential debugging information. In this case, what you need is for git to make it easier to revert 25 commits in one go
It was clearly a natural stopping point in that you stopped working on the code for the amount of time it took to switch. There's nothing wrong with encoding natural stopping points in your commit history, even when those natural stopping points don't always align with semantic stopping points (features/fixes/completed tasks). Sometimes those natural stopping points even encode data you might miss later: sometimes in making that switch you also mentally switch tracks and looking at where you stopped and where you restarted sometimes can remind you of things you forgot in between. I've found debugging cases where that helped me solve problems. (That said, in practice I rarely keep such "natural stop commits" myself, but I've also never been strict about it and sometimes keep them. There's nothing wrong with them and sometimes they are useful "waterline" markers in a larger effort.)
I don't make these commits often either, they exist almost exclusively to move code across machines. Actually, I commit much less frequently than most people, keeping most of my work in the staging area for long periods of time until it's ready to be committed. Probably a bad strategy, but it keeps my code clean…
My point is no matter what your commit strategy is, the reality of what actually happened is debugging info, and worth preserving to help debug later. While debugging and digging through history I'm often asking the question "what were they thinking?", and the info that you made a commit just to move to a different machine might make me e.g. consider it more likely that you left something out, or stopped a line of development early because you needed to do the commit earlier than you normally would. Squashing tries to hide the "how" and "in which order" so you only get the "what" of what was developed. But "how" and "in which order" are valuable pieces of information too.
If git has a concept of branches like mercurial has it would be a lot easier as you can actually see what branch your commits attached to.
(there are pros and cons of both approaches - this is a con of the git approach, I'm not knowledge about enough about esoteric details to comment on if git actually made a bad choice or just a compromise)
If you use the standard merge commit message in git, then you can still tell what branch things came from when being merged. As someone that has used both mercurial and git, the trouble with named branches (and having to remember to remove them when merging to master) is one of the reasons why I prefer git.
Knowing the name of the branch is not enough to find the commits. With Git, a branch is just a kind of moving tag on the last commit.
The problem mentioned in this thread is rolling back a feature that was merged. The only solution I know is navigating the log to find the first commit on the branch from which to revert. Don't forget there may have been several merges from and to master, as well as commits shared with other git branches that should not be reverted.
In a Mercurial branch, each commit is tagged with the branch name. Unamed branches, à la Git, are called "bookmarks".
> The only solution I know is navigating the log to find the first commit on the branch from which to revert. Don't forget there may have been several merges from and to master, as well as commits shared with other git branches that should not be reverted.
The only step necessary to revert the changes from a merged branch is to revert the merge commit. See:
The problem is if multiple people make other changes after yours. You can certainly go back to your tag, but you lose all their changes as well.
If you revert one specific commit it just does another commit with the exact inverse of what happened in the reverted commit, so all other changes are preserved.
The fix is very easy: forbid fast-forward merges, and then you can always revert the merge commit of particular feature branch.
As for cleanliness of history, everything should be as simple as possible, but not simpler. Squashing and rebasing is destroying history, which often could be valuable, as OP shows.
> If I rollout a rewrite of an endpoint and run into a weird issue in the QA environment, I'm a simple git revert away from fixing the issue. If I had spread that endpoint across 25 commits, I'd have to actually debug the issue in QA and figure out what I broke.
Surely this is why we have things like release tags and snapshots of previous versions?
Unpicking individual features is rarely simple even if you have got your git repo into an immaculate state, as things are interdependent.
I wish git had a builtin notion of two different types of commits: working commits and release commits.
I really like making tiny, continuous commits as I work. It's a great flow. git-revert becomes a Ctrl-Z on steroids. I don't what to clutter up the "official" history, with all these tiny changes, many of which don't even compile. That breaks git-bissect and all kinds of other flows.
So the only option is to squash commits. But there's something deeply uncomfortable and unsettling about permanently re-writing history. Plus, it's nice to have a history of those working commits as an artifact. If I'm trying to unpack the reason that I did something 9 months ago, then seeing a replay of the code changes is super-useful.
I go one step further and work entirely in get rebase -i where I build up a stack of tiny, incremental commits. This also lets me get small changes reviewed and committed on a nearly daily basis instead of building up days or weeks of work.
I've been wishing for a git GUI that lets me drag hunks around such a stack so I don't have to keep moving them by hand.
You can use merge commits to create your release groups. Use tools like --first-parent to limit the depth you see of the graph by default; that clutter can still be there, but you can take advantage of the nature of the graph itself to "declutter" simply by setting the "depth" you are working in.
You could get this behavior pretty easily with a double merge: “main” which contains the squashed commits and “history” which contains all commits as well as “main” merged back in post-squash.
I do see your point though.
Edit: the only reason to merge “main” into “history” is to enforce convergence.
There's an obvious upper bound. If you can't identify the last in-prod version that worked, and the first in-prod version that failed, then you have bigger problems than deciding whether to squash your commits.
Thanks to the complexity of how components integrate, even seemingly disjointed components, their commits are going to matter when it comes to troubleshooting an issue.
This gets a bit more murkey with mono-repos, but even microservices can combine to create complex production issues.
This sort of sidesteps the issue, IMO, of how hard it is to isolate a problem. Yeah sometimes things collide, but pretty often in my experience there has been one change that needs reverting or patching, not six changes that interleaved. Or rather, interleaving them at best increases the difficulty of the blame game.
By story number, by author, by timestamp, by looking for merge commits, by reading the commit messages - by common sense, basically. If someone has just pushed broken code to master, they usually know exactly what they've just pushed. This doesn't seem like a real problem to me.
You very rarely want to rollback a major feature. I’ve never seen it done except right after a deployment, in which case you can revert the merge commit instead. Bisect to find a breaking change is a very common operation. Squashing is bad if you expect to work on your project long term or if others may have the same commits as you from working on the same feature branch.
One can branch and then merge features, and get easily revertible commits and a full history at the same time.
Of course, there are caveats for reverting a merge - finding the right parent adds another failure-prone step, and your team can get in all sorts of trouble if they try to work with the branch without reverting the revert.
Wouldn't it be preferable, then, to avoid squashing your commits, and then tag your releases? This gets you the best of both worlds, you can still back out to a known good version in case of issues, and you can still bisect to narrow down the exact commit that caused your problem.
Bisect is the killer feature of git, for me. Squashing releases takes that superpower away.
Completely agreed — squashing commits isn’t desirable in every circumstance, but if all the commits pertain to a single feature then it makes complete sense to me.
>I've always disliked Git's squash feature, and here's one of the many reasons to dislike it. Had this happened in a code base with a 'nice history' (as the squash proponents like to present it), that small commit would have been bundled with various other commits.
This question always boils down to this metric. Which happen more often...
1. A circumstance arises where the dynamics of fine grained commits make the problem more obvious.
2. I have to interact with the git history
Now(and this is just me) I interact with the git history for one reason or another multiple times a day every day. I have been using git for 10+ years and I haven't yet encountered the first circumstance yet. Its not to say I won't encounter that circumstance and when I do I'll probably pine for the fine grained commits that would make it stand out.
For me the clean, easily browsable history benefits me every day. Furthermore because of how often I have to use it I should optimize it heavily at the expense of just about anything else unless the benefits of that item can be realized with similar frequency. The access to fine-grained commits hasn't helped me once in the 10 years I've been using git. With that calculus in mind I must conclude that I should squash the commits and pay the piper on the other thing when/if that bill comes due.
EDIT: Imagine this. If clean browsable history saves me 5 minutes a day then it has saved me ~10,000 minutes since I started using git. That equates to 1 full time working month. I really can't think of a circumstance where having access to fine grained commits would deliver a similar net savings.
For the best of both worlds, I like the "git alligator" workflow: always perform a merge commit when adding a feature, then use `git log --first-parent` to browse the history without seeing the intermediate commits.
Having fine-grained commits and having a clean git history aren't mutually exclusive. If you create and submit small PRs and git-squash those PRs into the main branch, you'll get the best of both worlds.
FWIW, the way I use git results in me committing a lot. I treat it as, essentially a save point that I can undo to if I need to. The history for my personal branches is a mess of broken tests and false-starts on code paths that just aren't going to work. Useless for anyone but me (but fantastic for giving me an hour-by-hour breakdown of what I've been working on and what attempts I've made).
>If you create and submit small PRs and git-squash those PRs into the main branch, you'll get the best of both worlds.
I definitely advocate for this. But I've worked some at some spots with "never squashers" that deliver every PR as 119 commits and tell me this yarn about "the time having all those commits totally saved their bacon". I've just never regretted squashing my commits into something manageable.
The problem is not on squash. The problem is on "PR" itself.
The pull request workflow popularized by Github and adopted by others (like Gitlab) is just bad for code reviews. They force you to choose between squash and history. Making several, smaller PRs are not the universal solution either. In a lot of cases those PRs have a dependency over each other and dependent PRs is also a pain in the ass in Github's PR workflow.
In a better code review system you don't have to make that choice. For example, in Gerrit (I _think_ this is also similar in phabricator, but I'm less familiar with that), every code review will end up a single commit when it's merged (if you configured it so, and that's the default configuration), and you don't lose the history during the workflow, all the intermediate states are stored as git refs on the server (you can't use bisect on them though, but they are there and you could write your own custom script similar to bisect to loop through them). Code reviews with dependencies are also handled naturally.
GitHub could _drastically_ improve this workflow by allowing a collaborative commit message to be part of the review. That way the PR can use small commits, and the main history can use atomic commits with an agreed-upon message useful for future code archaeology. Gerrit gets this correct.
I and several others I'm aware of spoke with a product team at GitHub about this over a year ago, but nothing seems to have come of it. In the meantime, my default is for any repository I maintain to have atomic commits with fully descriptive commit messages only.
The kind of use case described in the article is not, to my mind, a good reason for the purpose of a commit message - describing for future maintainers _why_ a change has been made - to be defeated.
Maintaining safety equipment takes time and effort we could be using for something else. But we do it anyway because when we do need it, it trumps all other concerns.
It’s better for people to get comfortable dealing with commit history than kicking the can down the road by getting the abridged edition. You’re infantilizing your coworkers, and hamstringing the people who do deep root cause analysis at the same time.
>Maintaining safety equipment takes time and effort we could be using for something else
You haven't established the relationship between"safety equipment" and your preferred style commit history. You haven't shown one to be safer than another.
>It’s better for people to get comfortable dealing with commit history than kicking the can down the road by getting the abridged edition.
I do deal with the commit history just a simpler, more meaningful version. If you could write code perfectly with as little effort as possible wouldn't you do it? Well we're not perfect but we can go back and change the history so it looks like we were :)
I usually understand "squash" to mean "bundle everything in a PR into a single commit". Which can indeed break bisect workflows.
There's a middle ground though: rebase your commits, but not necessarily into a single one. Before I submit a PR, I rebase my branch (which has lots of small commits, some of which undo previous work or are a work-in-progress), and make sure that every commit is as small as it can be without including work that is halfway done (so all tests still succeed), and have a clear description of what they're doing.
I then have both a nice history, and I can bisect to find a problematic commit, or inspect the commit history of a single line to get more context about it.
If you work on a very large projects with many released version and need to cherry-pick fixes to back-port, you'd be very, very, VERY glad that each branch got squashed.
The story told in the article mainly shows to keep your PR focused. You can squash as long as you don't do 2-week-long 37-commits branches.
Perhaps we could 'have it all' if git introduced a way of bundling commits, a bit like code-folding in an IDE. The bundles could be expanded when you need fine-grain detail of development history, but by default would be unexpanded and would behave like a single big commit, avoiding the issue of always seeing overwhelming detail.
Maybe... sure that's fewer commits to cherry pick. But when some reconciliation is needed to make a patch apply to different version branches, it's sure a whole lot easier to manage if the commits are small.
When rebasing my branches, most PRs will still be a single commit, some will be two or three. That's not that much worse than squashing, and far more informative.
I wish more people would naturally come to this conclusion in their careers, but many people I have worked with just don’t care about maturing their git disciplines. All it takes is one teammate who thinks it’s a waste of time to undo the progress of everyone else.
For context, my career has been in web development start ups, which generally reward the cavalier and tolerate the careful.
It's interesting that you note it only takes one person to undo the progress -- by the reverse token, every person who works in this way adds value to the practice and the history, which is why (and how) I've advocated for and to individuals to spend time on it.
Any one person can muddy the history up (by ruining bisect, say), but any one person can also improve it (by leaving good notes and right-sized commits whenever possible).
I notice that I get sloppy when I'm working on a repo with others that are sloppy as well. I guess the reason is that, if looking at the Git history is not useful half the time (because you end up at uninformative commits by other contributors), then you stop looking at the history, and then there's not much reason to leave it in a super-clean state anymore...
I think it stems from the lack of experience with actually having to backport/revert non-trivial code changes in a large repository used by many people. For most engineers version control is just seen as a mechanism to collect credit for ones work. It's only when you get into these thorny situations do you appreciate the power of git.
My commit history doesn’t always reflect my mental processes in a way I expect others to follow. Particularly when I’m rationalizing committing work in progress before leaving for lunch or before writing experimental code. So rebasing to consolidate a couple commits and reorder some others is helpful.
The commit history is performance art. You don’t need to know that I left out a comma. Or that I tried to make a bit of code easier to read but introduced regressions in the process (that also breaks bisect). You do need to know that I changed this code over here because I changed the code over there and broke something, because I either missed something or you’ll have the same issue the next time.
I admit that it takes a small bit of skill and experience to get to a place with a version control system where doing what you suggest is easy and does not consume meaningful time, but I very strongly feel that it is worth the investment.
Same here, I especially love `git rebase -i <base>` it opens an editor and I have an option to edit specific commits, reword the commit, squash multiple commits together (don't use this one often) or do fixup (which basically merges commit to the previous one)
I just learned about that recently, but haven't been able to put it in practice yet. The problem I encounter is having to specify which commit to fixup to when calling `--fixup`, which I think means having to look at my commit history. Instead, my process so far includes just describing the commit I want to fixup in my commit message, in a way that makes sense to me when I'm going through the interactive rebase.
Yes, that's exactly my process. In fact, I'm not even sure if I could predict what would happen when calling rebase without `-i`. Basically, my shell just autocompletes "git reb" to "git rebase -i origin/master".
An easy way to do this is to perform a 'mixed reset' from the tip of your branch back to the mainline commit where you started work. (Use a new branch, so that your original branch "backs up" the work.)
This will leave all of the final changes in the index, "squashing" any churn. From there, individually stage lines/chunks into a small handful of atomic commits for the PR.
But you need to deal with those conflicts before merging anyway? And they're a lot easier to deal with in the context of the individual commits than when dealing with one big merge commit, I find.
The issue is that when you rebase, each commit in the chain gets re-applied, one by one. If you change something in an earlier commit that affects later commits, you could end up having to fix the same or similar conflicts several times before the rebase finishes (even with git's conflict resolution cache).
In contrast, if you instead merge, you only have to resolve conflicts once, for the merge commit.
That being said, I tend to be very fastidious about cleaning up my history before pushing, even if it means tedious, repetitive conflict resolution. But I do understand that other people might not think it's worth the trouble.
I think that can be resolved by rebasing onto the first commit of your branch to first make sure all the commits are minimal, and only then to rebase onto your main branch.
That said, I must say that I've personally experienced as a pain, so I don't actually do that.
Agreed, I used to interactively rebase but have since switched to a four step process. If you have conflicts, you'll deal with them all at once when you merge in `origin/master`.
The most striking thing in this story for me has nothing to do with commit squashing. Faced with something that suddenly stopped working, the developer went on a wild goose chase, first decoding their JWT -- presumably the same one that "used to work" -- then looking for possible framework bugs or misconfiguration, before looking at their own code, the single most likely place that the bug would be. They even linked to select is rarely broken but failed to internalise the main lesson behind it.
Whether they squashed their commits or not would make a far smaller difference to the time taken to spot the bug than simply not assuming that it's everywhere else than their own code first.
Because 19 times out of 20, that "wild goose chase" finds the bug faster.
As there is no perfect debugging methodology, sometimes whatever method you use will go wrong and you'll end up on the bottom of your list of things to check, or worse, right off the bottom of the list. (Those are some bad days.) That is not, itself, proof that your list is broken. After all, the author found this an unusual enough experience to write a blog post about.
Over the years I've figured out that you can not only bisect in your development history, but also in the program runtime. Start with "print('1/2')" or whatever around halfway through the runtime of your code and see if you get there, then move to 1/4 or 3/4, etc.
This can be done with break points, of course, but in my setup adding a print is usually just faster.
Starting with your own code is going to find the bug faster far more of the time than starting with "maybe the JWT got randomly broken or my JWT library is broken" or "maybe ASP.NET is broken".
> Because 19 times out of 20, that "wild goose chase" finds the bug faster.
This particular goose chase involved digging into everything except the developer's own code. 19 times out of 20, the issue is with your own code. OP even acknowledged this in the blog post, but his actions didn't align with that acknowledgement.
Do people out there actually squash commits? Granted, I didn't change many work places in my career, but at no place where I worked people squashed commits. What's even the point of it? It's not like people routinely read the commit history, and when they do, they really would like a complete story, not 20 gargantuan commits that contain 3 years of development.
I find the individual commits on feature branches to be more noise than signal after they are merged. They can be useful during review, sometimes, but mostly I want a cleaner history.
> they really would like a complete story, not 20 gargantuan commits that contain 3 years of development
That sounds like maybe we split up work differently. 3 years of development for me or my team would likely have hundreds+ of merge/squash commits, not 20 large ones.
What do you use your "cleaner history" for? I seldom go back and look at historical commits unless I'm debugging, in which case I'd prefer to know what actually happened at the time with as much information preserved as possible.
A clean history with proper commit names can work as a change log for you app. We actually did that and it worked quite well. In Azure DevOps the release pipelines have access to the list of commits since the previous release so one only has to simply take the titles of those commits to build it.
I’d prefer that too, but preserving all commits often leads to things like: ‘Changed file 1’, ‘Changed file 2 and 3’ etc.
You’ll have a ton of noisy commits that together make up one full feature. In this case having all that noise squashed into one commit with a proper description is much nicer.
Isn’t that what rolling back releases for? i.e. revert the container (or whatever your unit of release is) to the previous version? If you’re removing individual commits and re-releasing (rolling forward), you’re still at risk of being down, because you’re not going back to a known good release.
And what do you think about the linked article that shows how it's easier to find exactly which code is buggy if you have it in small commits instead of one big squashed one?
But surely you would like to squash those merge commits as well, at some points? Hundreds of merge/squash commits pollute the history almost as much as thousands of normal commits, so after a large feature set is done, one should just squash all of it into a single commit.
That way, you can have "nice" development history where tags for the old versions are redundant since they match commits one for one:
$ git branch
main
$ git log --oneline
abcd000 (HEAD -> main) version 5.2
abcd111 version 5.1
abcd222 version 5.0
ef01234 version 4.7
9876543 version 3.4
fedb123 version 2.12
dedc456 version 1.128
The git log isn't a release history, it's a development history. Squashing commits throws away context that could be useful in any number of ways.
There's nothing wrong with squashing a bunch of commits that really should have been a single commit from the start:
- Update X to do Y
- Fix typo in X
- Add "foo" option to X for when Y is a bar
But most of the time "features" consist of many changes: we add one function we're going to need, then another, then yet another... Then we change an API to expose the new functions, then extend the UI to make room for it and finally make put it into the application and pull all the strings together.
Squashing all these into one commit is just a bad idea. You can always do git rebase -i before merging to do minor fixups or even reorder your commits (like when you notice a typo after several other commits), but completely removing all granularity... just no.
When you have to find a subtle one line breakage that happened years ago, you’ll be happy that you had a single commit with 30 lines of code instead of a squashed commit with 3000. It makes it much easier to isolate the case of a problem and fix it in many cases.
I’m asked “how long has this been broken and what caused it?” and answer it via git blame or bisect probably every few weeks. This is life on legacy projects.
I think you just highlighted one source of the disagreement. Nobody on any team I work with would accept a PR with 3000 lines of change. Each team had a policy, whether formal or informal, to break large work items across PRs so they were more easily reviewable. I would say that 300 lines changed is getting towards the bigger end of what we would accept. If your choice is between 30 or 3000 lines changed per commit, then for sure I'd pick 30, but above either I think a 300 line limit is more sensible.
Yeah, that’s nice in theory, but in practice the situation is often less ideal. Some features, partially implemented, would break existing functionality if not completed, and merging those upstream prior to total completion is therefore impossible. And on the other side of things, some environments and features require large, systemic changes. This is, surely, an organizational failing, but one that we must adapt to.
Then that's the other half of my point. If you can limit PRs to 300, then merging PRs into a single commit is better. If you can't, then maybe squashing isn't ideal.
That's not a squash or not problem, that's a your tickets are too large problem. If you normally end up with 3000 line feature commits, you're trying to do too much with individual feature changes.
Are we pretending that we, as devs, have much say in this? If our management says they want a features that’s going to take 3000 lines of code, you don’t have a choice. It’s nice if you’re somewhere where you can roll out feature mvps, but in some environments you don’t get that luxury.
And some features are just monsters, either through the nature of the feature or architectural choices that were made before it was conceived.
I mean, I’m the one arguing for smaller commits, so yes, if you make it 10 commits via 10 prs instead, that’s fine with me. But if all of that has to go out at the same time, that’s not any different than one pr with 10 commits (and probably worse because it’s harder to see all the changes together).
I think based on this discussion I can identify one situation where squashing would be good: when you're doing repeated trial-and-error commits on a single file. That way, squashing won't turn it into 3000 lines of code, it just hides the unsuccessful attempts.
Yep, but many tools (most notoriously git bisect, famous from the article) doesn't understand "--first-parent" which is ridiculous.
The only way to have a clean history that all tools accept is clean, is basically to outlaw merges.
> That sounds like maybe we split up work differently. 3 years of development for me or my team would likely have hundreds+ of merge/squash commits, not 20 large ones.
With the people I've worked with, I'd say most don't commit at all until they think the code is "ready" and they commit all at once. In the teams I've worked with, squashing vs not squashing isn't the question. I just want them to commit/push as soon as they've hit a stopping point or at least once a day. Maybe the people you've worked with are good with git but I am not that good with git.
I'm still stuck on 6: Resolve a merge conflict on git exercises because I made one too many commits and now the exercise says I have too many commits.
The argument for squashing is that minor updates like spelling, renaming, or test fixes during initial development can really clutter the history if they each have a commit. Many would rather see the actual change "Update X to use Y instead of Z" in the history, and minor details like "Fix mock in XTestCase" or "Perform renames from code review" within that single commit.
I'm sort of agnostic on this issue, but I do feel the article's author kind of overstated it here. Say 5 to 10 commits of this size were squashed together. Git bisect would've taken him 90+% as far and he'd have to read code or manually trial and error changes just slightly more. The binary searchable problem space would be slightly smaller, and the linear manual effort space slightly bigger. Less good, but really not that big of a deal.
For me, it's the other way round: minor updates like spelling, renaming, or test fixes during initial development can really clutter the major updates.
If i am making a commit that makes a complex but important change to some significant application logic, i want that commit to contain that change and only that change, so that when i have to re-read it a year later, it's completely obvious what i did and why. Bundling a load of refactoring and cleanup in there is a significant speedbump for my understanding.
Years ago, a sage pointed out the argument for squashing is really an argument for better tools. Imagine if you could flag commits as being of two types - major/minor, significant/insignificant, feature/refactoring, foreground/background, melody/rhythm, etc. Then imagine if the tools would by default hide, roll up, or otherwise de-emphasise the commits of the latter kind. This whole apparent dichotomy would go away in a flash.
This idea is floating around in the Wiki world. I believe it was Ward's Wiki that introduced a 'minor edit' checkbox in the editor; if a change was marked as a minor edit, it wouldn't be show on the recent changes feed.
You can imagine other ways to get somewhere similar. For example, you could have a special kind of commit that just groups a previous run of commits, and the tools could show that and hide the members of the group by default. There are probably many other ways to do this.
There is a special kind of commit that "just groups a previous run of commits", it is called a merge commit. Tools like --first-parent can restrict git log/git blame to just "top level" merge commits. The only real change is that isn't the default in most tools, but they don't have to show the full graph, they could better focus on a specific depth by default.
I squash commits. Well, I rebase so that changes are logical rather than historical. This is because when people read the commit history, which actually happens regularly, they would actually like a complete story, not 15 commits of "fix audit", "fix review" and "fix typo" - let alone refactors where previous work in the PR is thrown out, which you can by definition never care about.
Commits so small that they're nonfunctional also break bisect.
I'm suggesting you don't make them in the first place.
If you do make them by mistake - and everyone does sometimes, certainly including me - then sure, commit --amend or squash them before pushing them up.
The squashing the article is talking about is collapsing functional, distinct commits into one when merging to the trunk. It would be useful if we had distinct terms for the two kinds of squashing; the git command commit --fixup and its associated interactive rebase operation suggest the name "fixups".
They're the same technical operation, but they're used in different ways, and it's the difference in use that matters. It's useful to be able to distinguish them. It's never too late to improve our terminology! How long had git diff --cached been around when git diff --staged was introduced?
I'd moved workstation by then. If distributed source control is good for anything, surely it's reducing reliance on local state!
I don't think there's an easy answer here. I think there are good things to be said for a readable and bisect-able version history, but also that if you're not preserving your real commit history in a way that's backed up remotely then something's probably wrong.
I'm beginning to think this area, this schism into two schools of thought, is really a signpost that there's something lacking in git's branching. Or something everyone is missing, including me :)
Bisect doesn't have a problem with intermediate broken states, actually. It's why git bisect skip exists. It may not be able to pinpoint an exact commit though.
Bisect does turn into my colleagues feature branch with 20 broken commits and unrelated content though, when I just want to bisect the main branch. That's infuriating and imho makes bisect almost useless. That there is no git bisect --first-parent is one of the greates mysteries of git.
I often do intermediary commits that don't compile or break something significant, only to make sure that I don't lose the code. When that happens I always squash my commit afterwards once the code is in a usable state.
The alternative is making bisecting harder which is not something that I want. I want every commit to compile and be testable individually. That definitely doesn't mean that I think it's a good idea to have "gargantuan" commits.
An ideal commit should contain a single, atomic change to the codebase. Not more, but also not less.
I'm a bit confused on your comments here.
You state "I often do intermediary commits that don't compile ... to make sure that I don't lose the code"
And follow that with "I want every commit to compile and be testable individually."
I'm the same as OP. I commit constantly when hitting a natural stop point or when taking a break and push up to my branch/pr. I hate having code locally that hasn't been pushed to the origin.
But doing this means I have many commits that don't mean anything or are unfinished and in an uncompilable state.
So I git rebase and massage the history to make more sense.
I mean that I do many small "crap" commits while developing, once I'm ready to merge into the main branch I clean the history to get proper, atomic commits.
I squash commits on large PRs on a case-by-case basis, especially when they have a lot of very small and silly commits (fix this, fix that, wip, ...) but implement a very specific feature which required a lot of experiments that didn't end up in the final PR. And yes, I use the commit history very frequently, so it's important for me that this doesn't contain too much noise.
If I need to bisect a bug that was introduced by the PR I still can do this in the original branch.
Ideally, that would give you the best of both worlds. But some places delete the old branches after they're merged. It would be nice if there were an easy way to hide or rename old branches so that the in-use ones stand out.
Everyone who works on Linux must rebase and squash commits so that every commit builds and passes tests, and does a single thing. More about how Linux uses git:
I'm probably 75/25 on this. 75% of the time I'm gonna squash, because my entire commit messages is along the lines of "got to this point" or "fix frotzolate when x=7". 25% of the time my first round of squashing yields good commit messages that are isolated and complete, so it's better to leave those as separate commits when merging into the mainline.
I also aggressively reorder my commits. When it gets to yak shaving you basically develop in a stack, so you end up with a half-functional commit to system A at the top, then a complete commit to system B, then the finishing commit to system A, so it's better to just rearrange it so that you have one system B commit followed by one system A commit.
But I do routinely read the commit history (using git blame) and no I don't want to see the "complete story" of my past self having "got to this point"--I want the documented MR commentary.
Squashing is kind a blunt tool that is useful sometimes, but can be perfectly replaced either by crafting your commits with more care or doing interactive rebasing.
I once had to enforce to an developer that wanted to make 20 commits per PR that were titled like "wip" "wip" "wip" "fix bug" "fix mistake" "format code" in a PR that only changed like 10 lines of code in the end.
In this case, the main problem for me was that git blame became useless because this developer was touching way more lines than necessary and then undoing it using the code formatter.
I think what you really want is neither of squashing nor retaining: what you want is to take the messy history you had and then reconstruct a history that makes sense and commit that; thankfully, git makes this easy. I hate it when people leave messy thoughts in the commit history as I do use the history, and it is even worse if they just leave mega-commits. I want to see easy to review step by step organized thoughts designed to help the reader appreciate the series of steps needed to bring you from where you were to where you went.
It really depends on where you work and how your company's repo is organized. For instance, where I work 20 squash commits would represent, at most, 10 minutes worth of commits to the monorepo. Not squashing on merge would quickly turn thousands of daily commits into tens of thousands. It's already impractical to find suspect commits via git, and we typically use our code review tool to find the change that broke something (and that change includes the developer's full branch history). Adding more commits would needlessly slow down the tooling that displays 'git blame' results (and likely other commands).
I suspect there is a pattern in the comments here. People who work on small teams with granular repos that, individually, don't see a lot of daily activity think squashing is bad and erases valuable history. People who work with large repos that see high commit velocity (like a monorepo) think squashing (at least merge squashing) is beneficial and don't see the loss of information as problematic because it's hard to access it in the first place. Maybe I'm just projecting my own opinions on this; I'd like to hear perspectives that conflict with my assumptions.
You are definitely correct. People who argue against squashing have not worked on 10+ years old actively developed repositories, or in big enough teams.
A commit is a change. A change has a ticket. A ticket is a small piece of work that does not result in 3k changed lines.
This ensures that the change rationale is fully documented and easily identifiable.
Nobody needs those "fixed typo" commits. Nor the "implemented function A" commits. What IS the change that you're doing? What functionality? That's the most comfortable commit granularity to debug imho.
If you always work in feature branches and commit often (which you should IMO) it makes perfect sense to squash on merge. It gives you a nice history that only contains relevant commits instead of having a bunch of “added tests”/“fixed XYZ”/“remove debug log”/etc commits.
With some discipline this makes the commit history actually worth reading and makes git blame a useful tool.
# Implement feature
# Whoops fixed issue in feature i just implemented
# Add in whitespace
# Remove whitespace
# Forgot place to add in whitespace
# Fix variable name for feature
I can see a good case for separating #2 from everything else. Or depending on the change, maybe squashing #2 and #3 together separate from 1 and 4.
It's a good idea to isolate the bugfix so that reading the diff is clear. Stylistic changes such as #2 and #3 can be separated and called out as "should produce no behavior change" sort of updates.
I do, all the time. I highly recommend others do it too. I like to keep to 1 commit per ticket (even a super large ticket that might take me weeks of development). But having a commit history like the fella in this story is nice too. So what you should do is work in a feature branch. On the day of deployment I'll squash, and cherry pick (in my case to a train). If the deployment goes bad, and my code is at fault, then all that is required to fix it is a single revert.
If you want to continously deploy to production (with real users using it) you need a process to very quickly revert bad commits from it. At my last company with hundreds of developers, we would push to prod several times a day. Each deployment would have several devs changes included. A clean linear history is essential to making that process work.
You get a similar benefit (a single revert to revert it all) via "git merge --no-ff BRANCH". If the BRANCH is a fast-forward (i.e. it's been rebased to master/main & tested before merge) then you get both benefits of a clean history and an easy revert, for little downside.
Keeping the history of each incremental change, even in a branch, is IMO too useful to give up.
Note I'm not advocating keeping those "fixed typo in previous commit" fix-up commits: those should be properly fixed up _before_ merge, by judicious use of git rebase.
Wait, I must be missing something. I too develop in feature branches, and then they're merged into develop or main (with a merge commit, obviously). Reverting that is too only a single revert. The only problem is that sooner or late those feature branch do get deleted, and if you swashed, you lose all the non-squashed history. If you don't, it's still in the develop/main branch.
Personally I think of working in a branch as the process of creating a patch set. A patch in a series should only depend on its predecessors. You can submit more than one patch in a pull request.
I would never send out a patch set for review that includes all the "oops" "typo" "iteration 50" etc. commits I make as I work on code. Those are pure noise.
However, git is a tool for development, and during development I should be able to use git in whichever way is convenient for me: commit, rewrite and do whatever the hell I want with my local history. Not having that freedom is the primary reason why working with most non-distributed version control systems is such a pain.
When it comes to actually merging patch series to master, I like not doing fast-forward merges, since you maintain a natural grouping of the applied changes.
We go through ~600 commits per day. In my experience, there's nothing worse than looking for when a value was changed from 10 to 100, and finding it in a "Merged from X" commit with the history of why the value was changed from 10 to 100.
I don't often (ever?) browse the history, but I _do_ regularly search the history using tooling, (git log | grep <something>), and more is more in that case, even if the history isn't perfect.
If the original commits are:
- Change X to 10 by Foo
- Change X to 100 by Bar
- Change X to 50 by Baz
And that gets squashed into `Change X to 50 by Baz`, you lose the context as to why Foo and Bar changed it, and the values. If I need to go investigate an issue with X, I'd rather thave the history of all the changes.
Depedends on the git flow being adopted by the commiters.
HNers constantly espouse how clarity is more important than cleverness. It may be the case that a single commit offers more concision than multiple commits which could be perceived as just more noise.
I guess It's my development culture is flawed, but as one-man-team I always time constrained. So I sometimes don't have sufficient time to write too detailed commit messages.
So I end up with "one commit per feature" rather than "one commit per logical change". So I actively using squash during interactive rebase when I prepare to merge completed branch since my commit history sometimes looks like this:
Backend: new set of APIs for XYZ
Backend: implement feature X (+ some long comment)
Frontend: implement feature X
fix for feature backend
fix for backend API XYZ
front fix
backend fix
Yeah again I know I could do better, but yeah I use squashing for this reason. A lot.
Well, uh, yes, that's what the commit history generally looks like (although we do generally put "JIRA-9999: " in front of all commit messages, for context). So what?
What I wanted to say that since I don't have that many people other than me looking into my code there no reason to preserve real development history of every feature and IMO 3 "feature commits" are preferable than my development mess of 50 "fix that fix this" commits.
Yes, squashing parts of a discrete piece of work together, where it makes sense, then makes it trivial to git bisect any future problems.
Code committed to the mainline should always compile, and should be made of discrete changes. However, do what you like on your unpublished local branch.
> What's even the point of it? It's not like people routinely read the commit history, and when they do, they really would like a complete story, not 20 gargantuan commits that contain 3 years of development.
My company routinely reads the commit history. The commit is usually the 'why', and the code is the 'how'.
It's a mix where I work. Most people aren't at all tidy with their commit history, and I'll often see a PR with a fairly small final diff (maybe a couple hundred lines changed total) with several useless one-word commit messages like "fix".
And then after the PR has gone through review, most people tack on extra commits with equally-useless commit messages like "addressing feedback".
It's infrequent that I see PRs with commit histories that actually chronicle the history of the change itself; it's more a chronicle of the developer's changing thought processes as they try different things, go down blind alleys, change approaches several times.
Most of the individual commits have test suite failures and some of them don't even compile, so it's impossible to bisect across them if an issue is later found.
In those cases, I wish people would just squash (and some do, where I work). Yes, you lose information and separation, but I'd rather have one large working commit than 15 small broken commits followed by one working commit. Ideally people would curate things before submitting their PR, but I've found that most people just don't care, or don't understand git well enough to even attempt to do it.
Sometimes I toy around with the idea of trying to teach people (what I consider) better practices, and insist they are followed, but we all have a limited amount of social capital in our workplaces, and I'm not convinced this is something worth spending it on.
Interesting, it seems like madness to me to have it the other way around. Why bother with a merge commit if all the changes are squashed into one anyway? Why not just fast forward merge as if it was a commit directly to the main development branch?
I do read the git commit history, and specifically to find problems like these. Squashing commits indeed sounds like a terrible idea to me.
There are also people who insist you always need to rebase your commits before pushing, in order to get a nice, linear history, and again I disagree. It's fine when you're rebasing a very short (single commit) history, but for a long history, it not only gets very tedious, but when it introduces new bugs halfway through that history, you may not notice. You will notice later, and then tracking that bug you end up in the middle of your own (rebased) history and you may wonder why you ever did something so stupid, when the actual cause of the bug was the merge of two histories at the end of your work.
I do rebase sometimes, but only when the history is short and I can easily see what I'm rebasing. A history longer than 2 commits should not be changed.
Yes, our company squashes commits (we have tens or probably hundreds of thousands on the main branch). Yes, I routinely look at commit history. Commit history is also used for doing analytics for performance reviews such as how many commits did you make, what percentage had tests ect. For instance my last performance review made note that I had test coverage on 90% of my commits.
Also, we auto-stamp the PR on the commits so we can get the context. I don't understand how anyone could make sense of a large codebase when all the commits are added with the standard inane commit messages (e.g. "fix it", "typo", "name changes", "tests") that people do when building a feature. I routinely have to look at a piece of code, do a git blame, get the PR in the commit and figure out what was being done.
In my case, it's generally because I'm preparing training material.
When I do that, I have two branches: dev and master. master is the one that is exposed to students, and dev is the one I use for prepping, testing and staging.
Yes I do (well, I rebase, not squash blindly), but I do read the commit history. Or well, not so much the complete history, but I do go looking for which commit introduced a particular line, what message went with that change, and what else changed with it.
Of course, that only works after you leave proper commits. The reason people don't do that is because you only look at your commit history if you've kept it useful, but if you've never looked at your commit history because it's not, you don't know the benefits of keeping it useful.
From experience of working with new CI/CD systems where one doesn't quite understand what's failing due to ... reasons.
It's really quite easy to get into the 50s or close to 100 commits in a branch, which can lead to a horrendously messy history.
There are also people that like to commit before the try something locally and their branches are effectively a mess.
All of the above is magnified in a monorepo environment, where you might have thousands of commits a day against master if it weren't for enforcement policies that force a squash.
That's not how it works, really. A simplified workflow:
You take bug or enhancement of a couple of points, work on it on its own branch, then merge req+squash the mostly noise commits into a develop/master at once.
A chunk is approximately 3 days, not 3 years. For forward-looking projects, it saves time and works well. "Enterprisey" projects maintaining legacy branches are less well served.
It depends on the situation. If I'm merging a feature branch that has a lot of commits that effectively make up one feature (because a dev had to go back and forth or because there was a lot of feedback), I might squash those commits into one before I merge it into master.
As mentioned elsewhere in the comments here, that also makes it a lot easier to revert said feature if something goes wrong.
We do. PRs are generally smaller than 3 years of development. Sometimes refactoring will be split out so what could be a single PR with multiple commits instead becomes multiple PRs. This allows review to be focused on each PR & enforces that each commit maintain CI passing
Phabricator has a quite different (I would say better, others would say worse) development flow philosophy to GitHub/PRs.
Phabricator’s preferred model, which is heavily influenced by Facebook, is to forgo feature branches entirely and just stack many small changes on top of each other, landing as and when you want (this doesn’t preclude you working on feature branches locally, of course, because Phabricator doesn’t care what your local checkout looks like).
Because of this, Phabricator considers each diff to be discrete, and if you have multiple changes making up a single feature they should in turn be broken down into separate diffs.
Personally, I think stacked diffs are the killer feature of Phabricator. Unfortunately I haven’t been able to find a similar flow with PRs (recently we migrated from Phabricator to GitHub for one of my projects), you end up fighting against the tool a lot.
Stacked diffs have been a pain for me as well, but recently I found a tool that makes it super easy to implement stacked diffs on top of GitHub! I started using it a month ago and it makes complex code changes so much easier to split up into manageable chunks.
If you want to learn more you can email me at ericyu3@gmail.com. Also happy to help you get it up and running - just put some time on my calendar at https://calendly.com/ericyu3/15min
I don't think having a history full of "Fix the shaver, maybe yaks don't need 6mm trim" with subsequent "Fix shaver again, yaks need as low as a 3mm trim" with some more intermediate commits help understanding what happened either.
That's much better than looking at a file in a 300 file commit and seeing "merged from XXXX" with no information as to why that one line was changed. I'd much rather spend 30 seconds parsing through the 10 yak shaving commits than have to go trawling through old commits on a file to find the most likely owner of it to ping on slack.
You’re creating a false dichotomy. When people say “squash your commits”, the don’t mean “squash your entire repo into one commit”, they mean “get rid of you ‘typo’, ‘typo fix’ commits”. Your commits should still be small and self-contained.
Actually I think you're creating a false dichotomy. How many people are commiting dozens of "typo/fix build" commits and then squashing just those? In reality people are squashing the iteration process of "add an X, add a Y, remove the X because it didn't work, and add a Z instead of an X and Y" into "Add a Z"
If you're simply talking about removing "fix typo" commits, then just don't. Just ignore them. You don't need them, but someone might. They're not hurting you?
When that happens, I usually stick in a comment about it. Generally anything worth committing gets merged anyways so you still get the "Revert X" commit in there too.
yep exactly this. I found early on while having to work with perforce and using timelapse view (git blame but with a nice gui) that lots of intermediary commits like this generate enough noise to discourage even the most determined sleuth from using commit history to solve a crime.
Well scoped and well sized commits, squashed, and in my personal preference rebased, provide a commit history that's segregated on actual tickets/stories/features/fixes that I find are navigatable very far back.
I'd say squash usage is very much case by case. But yeah equating a feature like squash to large commits is a fallacy. It's a useful tool imo that can be misused, but that doesn't mean the tool is "bad".
I feel like any discussion of squashing or not is only half the story - people need to write good commit messages. A lot of people I've worked with commit too granularly for that single commit to be useful, so squashing gets things grouped more usefully. However, if they still make low effort commit messages, then the history is useless. But it always was useless, squashed or not!
Depends on the Git craftmanship of the team is my experience. If they commit every small fix with a useless commit message, history becomes messy real quick. Squashing can be seen as a solution here. Until you learn proper rewriting of history as in (interactive) rebasing. After which you can put you code changes in every commit you want and order them around as you see fit. But that also depends on the Git workflow that is used and how much branches are shared amongst team members.
I had one job in the past that used Gerrit[0] as Git tool. One of it's features is that it creates a "pullrequest" for every commit in the branch you push. Which needs to be reviewed individually. This is really anoying if you're used to organising your work in lot of commits to record each step of your developerment. But from a project's Git history perspective it makes a lot of sense. As every commit is 1 change, one feature, one contained unit, that is added to the main branch. So instead of the main branch now containing countless commits with each developers complete history on a specific feature (where code is added in one commit to be removed in the next) it contains the features as distict commits, making them easy to bisect and revert if needed. This looks a lot like squashing but because you do it before you push your code you learn to put much more thought into that single commit and the commit message.
depending on what im doing, i'll do squashing, fixups, and commit reordering in an interactive rebase prior to merging to at least reduce the number of noise commits.
If you use Pull Requests with squashed commits, you can do exactly the same process as described here by isolating the issue to a PR then restoring the branch and bisecting from there.
It seems like a small price to pay for a clean history, given the rarity of occurrences like this.
It makes the initial bisect easier and faster as well, because every commit is a fully working one, and there are no trivial commits that still have to be rebuilt and bisected over. Plus all the commits are found in the order they were introduced to master, rather than randomly interleaved with other feature branches that were built at the same time. Personally I would think any kind of branch cleanup to get the history of a feature branch into a presentable state is a significant price to pay. I've only had to restore a PR branch a handful of times to find the particular culprit.
Because with a test suite, git bisect, and a history that's been reasonably well cared for, I can just tell git "find me the bug" and go do something else until it's done.
I don't like having to go fiddle manually for that to work.
Ideally you have CI set up so the average test failure email points you to the small patch that broke it.
For bugs from the wild that doesn't work, obviously, but I still prefer less friction.
Sounds like you've moved the work forward, where it always has to be done. Rather than doing it in a lazy manner, where most instances can be avoided. Is that a good trade-off?
Like most tradeoffs, it depends on the particular circumstances, I think.
I don't have this set up in my current gig, because we have more pressing priorities.
It is incredibly useful when you do have it, though.
Generally it's more important for really large projects - IIRC Android is really strict about not merging branches containing any commits that break tests, for exactly this reason.
As that implies, you wind up rewriting branches in this approach, too, but for different reasons.
Is it always possible to restore branches from the PR? e.g. do some code review systems automatically delete committed branches after a certain time period?
You mean Gihub? I'm not familiar with squashed commit there but it seems that it requires that original branch is not removed from remote? Because if there is no reference to this branch then it will be garbage collected after some time.
You don't remove PR's branches after merging them?
This article presents a very good argument clean and squashed commits are important, under a click-baity title.
The ability to use git bisect effectively is one of the more important reasons to enforce a clean and readable commit history by squashing and rebasing before merge.
A history where the majority of commits doesn't even compile ("sorry, updated test value was wrong", "oops syntax error", "forgot to update these references in the last commit", "big refactor wasn't complete") is a major headache not only to readers but to anyone using automated tools such as bisect. The author here was lucky the commit history was in a good shape. The size of the offending commit was reasonable too, so any trivial commits had been squashed away here.
My personal issue with this particular commit is the useless commit message. "Extract CreateTokenValidationParameters method". Well, obviously. But why? What was this intended to result in? Why was particular change made and not something else?
A more suitable commit message would have included something along the lines of "JwtSecurityTokenHandler methods belong conceptually with other code that configures JWT parameters. Break them out to CreateTokenValidationParameters because ..", that would have make much more sense and made the change easier to understand for someone else!
After all, this is how the author describes the patch, when taking the time to do so in order to write a blog post. It isn't that hard.
I would say that this is a good argument for linearizing the history by rebasing, but certainly not for squashing.
Git bisect can only narrow you down to the scale of your commits. If you do infrequent, large commits, it's not very useful. If you do frequent, small commits, it's great. If you do frequent, small commits and then squash them into infrequent, large commits, it's not very useful.
There are plenty of reasons to squash (and, personally, I generally think they outweigh the arguments against), but this is not one of them.
Squashing is a form of rebasing. Nobody suggests commits should be too large to be useful, just that they should form coherent changes. Preferably self contained enough to be useful in their own.
When you do frequent small commits, unless you are superhuman the majority of them false starts or contains errors. Squashing these useless commits makes the history understandable, and enables the use of tools such as bisect.
That's what "please squash before merge" means.
It does not mean you should do rebase insteadof merge. There may be good reasons for that too, sometimes, but that's not the point here. Should you wish to enforce a linear history, it is still just as important to squash undesired commits.
> Had this happened in a code base with a 'nice history' (as the squash proponents like to present it), that small commit would have been bundled with various other commits.
This is a misunderstanding of what proponents of commit squashing advocate. Often, when working, you end up with multiple commits which represent a single change:
remove debug logging and fix bug
more logging
debugging
typo
Squashing these commits together into one coherent change makes for a history that is much easier to understand.
Squashing unrelated commits, however, doesn't help anyone.
I can only sit in amazement as I read this thread, watching seemingly smart people advocating throwing away history for the sake of tidiness. It's maddening.
Those are four commits. That's what happened. It does not matter that it's untidy. It's your history.
Just leave it. It will save you a lot of work some day when you need to find the error you introduced when you accidentally removed one too many lines on that "remove debug logging" commit.
> It does not matter that it's untidy. It's your history.
That is only true if you are working in a personal git repo
If you are committing against a repo with many others though, having a tidy git history is important. If I discover a bug and bisect to the "remove debug logging" commit, it is hard to revert that - or to even understand what it means. I then have to go in and try to determine which feature that debug logging is for and how many commits I need to revert to get back to a sane place.
My general rule of thumb is that I squash commits (when merging to a public repo) if the commit doesn't warrant time into creating a proper commit message (explaining what I am fixing, why, etc). I have no problem creating a set of 10 commits to merge in one feature, but each commit needs to be individually reviewable and understandable.
There’s “throwing away history” and then there is just clutter. If you added a feature and then ten commits to fix typos on top of it, just squash it into the original one and pretend like it was right all along. It keeps the repository sane and not full of “typo fix” commits, and, like, I don’t really mind looking for a change in a diff that is 20 lines instead of a dozen that are one.
Here is the thing: With git, the history looks the way I want it to look like. It is not a list of how I worked and in what sequence. Unless I want it to. I generally don't and instead prefer the history to be comprehensible by someone looking for how features are put together and interact and changed over time. THAT, saves time when I need to look at it again.
> Just leave it. It will save you a lot of work some day when you need to find the error you introduced when you accidentally removed one too many lines on that "remove debug logging" commit.
I understood the parent as "remove debug logging" removing what was introduced by the previous two commits. In this case, squashing the commits will actually make the issue _much_ more obvious, since the accidentally removed line now stands out by itself without the clutter of the back-and-forth changes.
I have done one or the other. My personal experience is that intermediate commits provide low value. YMMV but I now know what works for me.
I definitely leave in commit A and a revert of commit A if I think it'll be useful.
The way I see it is I'm leaving breadcrumbs for myself for later. It's obvious that a sequence of keystrokes is useless (and most IDEs will undo a large number at once - correct behaviour) and that no history is useless so you have to find the thing that works for yourself or your team wherever.
I already spend a fair amount of time on other people's problems and they aren't doing much to save me, so what is the point of being obsessive with my own problems such that I may avoid spending a little bit of time on them later?
> Squashing unrelated commits, however, doesn't help anyone.
This however is something I've seen people do. Worst case I've encountered was a group of people that squashed all commits belonging to a whole product release - we are talking a team of 6 people working full time for half a year. Absolutely atrocious behavior, but there are lots of weird people out there.
In a given team, mandating the squashing of commits essentially means admitting that pull requests' branch histories tend to be far from semantically valuable.
Which can be perfectly fine, as it's hard to ensure that all developers use Git in an optimal way (I'm thinking of intentful use of interactive rebasing), uniformly.
The only problem I find is when squash proponents claim their choice is superior. It's not; it's only superior if you aren't willing to maintain great history as you work on a given branch.
Of course there are also added benefits to a fine-grained history, as the article mentions.
My experience having maintained a production app with a fine-grained semantic history for 5 years is overwhelmingly positive. Understanding root causes, confidently reverting things etc becomes a much quicker job - particularly important when production is red.
Right. The problem is that are two use cases for commits:
- a historical, fine grained log of changes
- and a log of merged features.
There's value in keeping both of these data sets. But the commit log as it stands can't easily serve both masters.
Once you can see the problem for what it is, the solution is simple. Instead of conflating these two use cases into the concept of a 'commit', we need separate tooling for each of these use cases. The commit log should probably house the historical record. And then we need a way to mark a set of commits as belonging to a particular feature's development. That could be achieved either by adding special support in git or via convention using commit messages.
Either way, I want to be able to see the commits in my repository grouped by the feature that they belong to. And I want that data set browsable on github, referenced against the corresponding github issues when thats appropriate.
If you consistently use feature branches and have meaningful summary commit messages in the merge commits as they land on the mainline branch what you're describing is simply:
git log --max-parents=1
git log --min-parents=2
E.g. try this on the git.git repository (not perfect there, since Junio doesn't use this pattern all the time), but it's good enough for a quick demo.
The most useful convention in a DAG like git is to have the topology of your commits reflect your workflow.
One obvious requirement for bisect to work is that your code builds on every commit - and so does everybody else participating in the git history.
Straw poll - do you enforce this? If so, for literally every commit, or do you use partial squashing to maintain this property?
While that's certainly a _desirable_ property, I've never really been concerned if, say, the penultimate commit on a PR failed CI. It feels like it would be a hassle.
I don't "enforce" this but I do aim for it, and yes, I do partial squashing throughout my workday.
As I work on a feature branch, I'll check in WIP commits as checkpoints, especially at EOD. I don't expect these to pass the full CI suite.
But as the code starts to shape up, I'll unstage all those WIPs and start to group the changes into logical commits. As work progresses, I'll use `git add --patch` to split new lines of code into those existing logical commits. Sometimes I'll split one up, sometimes I'll group two together; it's still flexible and amorphous at this point.
By the time I'm ready to merge upstream, these commits tend to be neat, focused, and functional, and I do check to make sure they pass the relevant tests (though I don't enforce a full CI build here).
Then a rebase from master, push to CI, and then a no-ff merge commit into master to retain both the low-level commits and the ability to easily revert the whole lot.
It might seem like a ton of busywork, but I find that staging atomic commits like this doubles as an excellent line-by-line review of the code I've written. It also forces that review step to happen throughout the process rather than all the way at the end when I've forgotten all that deep context.
What (I assume) is normal is that you have PR validation and CI builds that only maintain the stable shared branches.
If the validation isn't extremely fast (seconds or minutes) then maintaining it for every commit is almost impossible. It would just get other side effects like people avoiding commits because they don't want to run an hous-long test suite more than once.
We use JavaScript at work so even if a part of the application doesn't compile, we usually can run most of the tests. We don't bisect very often and so far, every commit has been good enough for the specific tests we wanted to execute in case of a bisect.
I've always been careful about this even though I've not had the habit reward me yet (3 years of exp). If I'm rebasing, I go through every commit and make sure it builds.
TIL about `git bisect` and it's really vindicating.
There is no such requirement, you can just skip commits which cannot be tested. You just make a simple test-case just for current bug outside the repo.
It is not related to squashing or not. One of the main rule when using git (or any other VCS I guess) is that each commit should be atomic.
IMO squashing commits is something that you should do locally, not remotely. For example I'm currently debugging a fairly large C application on an embedded system. Each bug has it's own branch. When debugging / testing / trying to fix it I tend to do a lot of commits. When the bug is fixed I do an interactive rebase to pick / drop / squash commits in order to have only a clean one at the end (and then push it).
Why do it locally when most modern repository software (GitHub, GitLab, BitBucket, etc) can do it for you when a PR is merged?
My point of view is that if work is being done on an individual feature or bug fix, having a view of the individual commits might help give any reviewers important context on how a final solution was arrived at. It can also be helpful to be able to see the specific changes that have been made since a prior review.
While in most cases I favor squashing commits when it comes time to merge into the parent branch, it seems like doing it manually just creates extra work and potentially throws away information that may have been useful to reviewers.
Squash renders git log and bisect nearly unusable, that should be common knowledge, but I guess not. Normally there's never a reason to squash, small fixes can be done with git commit --amend for last commit or git rebase -i using the fixup keyword, but again only for really trivial things like typos and such otherwise same problems with log and bisect.
You can revert a merge commit by providing which side of the merge is mainline:
git revert -m 1 $sha_of_the_merge_commit
That provides the "revert a commit and remove entire features" use case, assuming the original feature was all encapsulated within the merge (branch) in question. You can get the "readable commit history" with:
git log --no-merges
You can have the best of both worlds. git has had these features for as long as I can remember, but devs have had the "NO MERGES ONLY CLEAN HISTORY" platitude repeated so often they must think git lacks an alternative.
I think much of the difference here depends on your workflow. Where do you run tests? How do you manage build infrastructure? What is your process for code review? What is your process for validation and testing of shared branches? How do you manage releases?
In my opinion, squash merges make the most sense when the following are true:
1. Developers cannot push directly to shared branches; all commits into shared branches are done via pull requests with mandatory code review.
2. Shared branches must build and pass tests at every commit.
3. The team builds features incrementally and uses the "Branch by Abstraction" approach (feature flags/experiments) to ensure functionality can be merged before it is ready to be enabled in production.
4. Changes that merge into the shared ("trunk") branch are released frequently.
Now, you may argue against some of those practices, and if you end up in a different place on one of those, squashes may not make sense to you. But then the actual difference of opinion is elsewhere. It's not really about squashing, it's about how much code is a reasonable granularity to code review at one time.
If you can keep the code review cycle tight, then a pull request branch basically becomes almost like a virtual pair-programming session. There may be some back and forth as the author and reviewer settle on a final form of the commit. But I don't see the commits that happen along the way as valuable in the least. (In my experience, the vast majority of intermediate commits are things like "fix typo", "make linter happy", "rename this function because code reviewer pointed out it was named inconsistently", etc.) Instead, my mental model is that a pull request is like a mutable commit: you keep mutating it until it's the commit you want, and then you merge it. Since my mental model is that a pull request is a kind of commit, it makes sense that when it merges in, it does so as a single commit.
Again, I'm not saying that this is the only valid way to work. But neither is it an invalid way to work.
As a meta-comment: When you see someone else making a technical decision that doesn't make any sense to you, rather than instantly assuming that they are a clueless incompetent, it's usually more instructive to assume that the decision does make sense to them, and to try to figure out what about their circumstance is different from yours to make that the case.
I don't see how squashing or not would have made this case hard to find. Once you find the bad case "squashed" one, one could always reset HEAD^ and just checkout files back and forth til you find the failing change.
Great discussion in this post. While I do generally prefer squashed commits when merging a branch, I absolutely seen that cause some pain in specific circumstances.
I think part of the problem is that modern repository software has built an 'additional layer' on top of git (IE: pull requests) that most of us have become accustomed to using in our day to day workflows. Git itself doesn't really have a 'lossless' way to persist that extra information.
Squashing commits at merge gets us close, but ultimately we’re throwing away potentially useful information by doing so. The main time I’ve seen it be problematic is when multiple people are working on a complex feature where PRs are getting merged into a long-lived feature branch rather than directly to the default branch.
Say I need to base my work on someone else’s WIP branch for said feature. If they squash their commits when merging their work into the feature branch, I’m going to end up with merge conflicts that need to be manually resolved as the individual commits that I had based my work on no longer exist, as they’ve all been squashed into a single new commit.
What if git supported a sort of 'soft squash' concept, where associated commits could (optionally) be grouped together with additional metadata. That would let the application consuming the git history (IDEs, CLI tools, etc) make the decision in how commit histories are presented to the user.
> Not all bugs can be reproduced by an automated test
While that may be true, I'd say in this case, order-dependent tests are evil. This case study pretty much fits the definition -- the test result changes depending on what other tests are run before it (due to shared state across tests). It certainly is possible to write automated tests for this -- reset that state to its default before each test!
One thing that Google recently (in the past year or so) added to its CI setup is a job that goes through periodically and randomizes the order in which tests are run, thereby increasing the chance order-dependent tests are caught. (Early on, I remember being annoyed by all the triggers due to floating point error accumulation in a monitoring library -- do I really care about an error of 1e-10 when my noise is >10?)
That's a long-winded way of saying that it's also possible to catch order-dependent tests by shuffling the order in which they run, although built-in support may vary depending on your testing framework. (from a brief search, it looks like rspec and junit both support randomization; xunit forces it).
That said, I absolutely agree with the other conclusions, especially that binary search for debugging is immensely useful (even when dealing with Google-scale tens of thousands of unrelated commits due to monorepo), and definitely empathize with debugging often taking much longer than the bugfix.
A lot of people seems to suggest that squashing PR's branches is better because you don't get commits that are broken and it doesn't work good with "git bisect".
But IMO it's not a big deal, you just do "git bisect skip" for these comments and at the end of bisecting session instead of one wrong commit you'll get couple of, for example one commits that builds and two that you skipped and doesn't build. It's very likely that looking only at this 3 commits will help you find the issue because they might be part of much bigger branch which would otherwise be squashed to one huge commit with enormous diff.
I much more prefer to require that PR's branches are always merged without fast forwarding. So there is always merge commit for PR. Then you can actually display list of this commit with "git log --first-parent" and they should always build because build server verifies it.
Unfortunately "--first-parent" doesn't work for "git bisect" now, but it finally will in the next git release! [1]
I've really never understood this obsession with "preserving" history. When I work on a feature or bug I tend to commit often, because it's cheap, and it can save my ass if I go in a bad direction and break something that used to work in one of my previous commits. I subscribe to a code fast philosophy, and work quickly to explore the space, find unknowns, get a working solution, write tests, and do final cleanup, docs, and optimize if needed. As such this would be a common commit history for me that would end up in a small PR. A large PR for me might have 50+ commits.
- initial prototype of feature x, mostly working
- feature x working as intended in requirements
- fixed edge cases not originally identified when thinking
about feature x
- rearchitect feature x a bit now that it's better understood
- write tests for feature x, most pass
- get all feature x unit tests pass
- a few more test cases for feature x
- and a couple of integration tests for feature x
- code cleanup for feature x
- fix typo from last code cleanup resulting in bug
- better docs for feature x
- fix typos in feature x docs
- update main readme to include notes about feature x
Can anyone explain to me the value in preserving this history, because I'm just not seeing it, and it would completely muddy up the overall git history. There's tons of intermediate work we're not concerned about preserving, like the notes I scribble on paper, so why are we so concerned about preserving all these commits? I admit there's a small chance it's useful in rare circumstances, but I'd rather optimize for the common scenario.
Some software engineers can be fanatical and rule-bound; commit etiquette is merely one milieu where rigid thinking can be applied, and with the usual detriment. I applaud the author for presenting a clear case how reality is more complex than many beginning to intermediate developers would like it to be when it comes to commit history.
The main focus shouldn't be on keeping the commit small, it should be on ensuring that the commit comprehensively encapsulates a unit of change in the codebase. Dogmatically pursuing such small commits will lead to a noisy commit history, and all for the meagre payoff for when such niche situations occur.
BTW if such a thing really does happen in production and you already squashed and merged, you can always fall back on on git reflog on the developer's machine. Commits are never really destroyed in git, the branch simply shifts focus to another set of commits. The old commits are still there, dangling and not referenced by any other branch but they are still reachable.
You can always "bisect" through space (execution flow) OR time (commits) to find a bug.
I usually prefer through space, because
1. it actively helps narrow the buggy line in the code you're working on, instead of reflecting over different snapshots of code
2. sometimes you have nothing to bisect on, if it's code you're actively writing.
The author seems to have started bisecting through space using the debugger. Unfortunately they had the wrong understanding of the execution flow and quickly stopped after one step at the start of the route. Had they realized the route wasn't triggered they could have checked what happened in the authentication code.
Are you saying he should have stepped through the framework's authentication implementation in the debugger? The bug was in the configuration, if I understand correctly, so there wasn't much to step through there.
Squashing gets you in the habit of force pushing, and force pushing greatly increases the risk of clobbering someone's work. I've seen a lot more hours wasted on lost commits than I find credible to be wasted in slightly more verbose history.
Squashing also turns cherry picked commits into merge conflicts.
I'm not a fan. I can always go back to the feature PR and read the ticket details and diff commentary when I want the history with a full diff. And of course very old history isn't very relevant because less of it remains in the present, so rot of ancillary systems isn't a huge concern.
I'm no squash fan either, but git very rarely actually clobbers data. (As long as it's been committed at some point, and wasn't just sitting in the working tree.) `git reflog` can get back pretty much anything, as long as it happened relatively recently.
That’s why I said “as long as it happened relatively recently.” Of course, the commits in question should exist locally on someone’s machine, so we shouldn’t need to worry about what the remote does.
Most people at my previous place did squash merge on pr completion and we cherry picked quite successfully, so I'd be interested to know how you get to:
> Squashing also turns cherry picked commits into merge conflicts.
It really depends. I've seen developers commit tiny changes all the time, which do not make sense - I wouldn't want to trace a small logic change over 10 commits. Also, sometimes you commit, then realise that there's a stupid mistake you've made and then you fix it with another commit. In those cases it does make sense to squash the commits so that you get a normal, logical, commit in the history. On the other hand, squashing perfectly legit commits together into a horrendous single commit is a pretty bad practise for obvious reasons.
Commit history serves two largely separate purposes:
1) To provide a code-level record of what has changed in individual files and why.
2) To provide a high-level record of what features were introduced and what bugs were fixed over a period of time.
Often people will forget about one of them when arguing for a particular approach.
Squashing can make 2 easier but annihilates 1. Rebasing gives you 1 but makes 2 difficult, or requires that you track high level changes in an external system. In theory, approaches using merge commits can give you both but they are often difficult to apply in practice.
> Clearly, you're not familiar with this code base, but even so, you might be able to intuit the problem from the above diff. You don't need domain-specific knowledge for it, or knowledge of the rest of the code base.
A question I ask, is that if I reverted/rolled back to this commit, would the system still work? Commits that require other commits to work should never be in isolation in case a rollback is required. You should be able to check out any commit in the entire system and have a (hopefully) working program.
Unrelated to git but as soon as the authorization was failing with even a correct JWT token, the first thing I would have looked at is the AuthorizeHandler and registration stuff in Startup.cs.
It got crystal clear when the breakpoint was not hit or when removing the Authorize attribute worked.
The repeated saying of my tests were passing and it must be the framework kind of annoyed me as the Authorize attribute is not magic and there need to be wiring stuff which need to be written.
When you commit your fully tested and bug free code to the master branch, developer WIP commits should be squashed. It's one thing to commit your WIP commits to a toy git master branch you control but as soon as others look at it, it doesn't scale.
I've asked this question in job interviews. It's amazing what people say.
Sad that you use your strict etiquette as a leading question in job interviews. How do you know that code is bug free when you commit to master? What qualifies as fully tested? If you are convinced that there are absolute and known answers to these questions I would suggest analyzing your logic again.
Not squashing commits doesn't scale. That's the key to discern who has experience on large projects. Intermediate commits are of no interest to a codebase with dozens of developers.
Mercurial's evolve extension has fold, which is similar to squash. However, it still has all the individual commits if you wish to examine them - they're just "hidden" and the usual commands (logs, etc) will show just the squashed commit.
Git does hide the old commits as well. What git doesn't do is track which commits are replaced by which. So sharing mutable history and seeing how a commit evolved over time requires more heuristic than necessary.
Yes, unreachable commits in Git are GC'd eventually, but you can disable it.
In Mercurial, hidden changesets are kept locally indefinitely, but they are not exchanged; only their obsolescence makers are. So you always know the meta-history of a changeset, but not necessarily their original content.
Is there a way to make git keep the unsquashed history, linking it to the squash commit? That would solve the problem here, because anytime bisection locates a squash commit you could just follow the pointer to the unsquashed history and repeat.
Worth mentioning that, while the `git bisect` algorithm is pretty easy to run by hand if you have a linear history, `bisect` also handles very complex branching and merging histories as well. Very clever and worth using!
I never understood the need to squash commits (or rebase). If you do merge requests and use merge commits (like GitHub or gitlab do). A "nice" history is a small script away. It should even be a part of the GitHub/gitlab gui. Do not loose information about the development history!
I think this depends on your policy. The policy at my work place is that every commit into master is always a merge commit. When looking at the git logs, we then always do:
git log --first-parent
This only shows the top level commits (either a direct commit or a merge commit) and doesn't show any of the subcommits in the branches.
This gives us a very clean history, something like:
commit ce29331f7da82ce528ca6e437b8893248a842169
Merge: a14522cbf 936ef9f90
Author: Joe Sample <joe.sample@example.com
Date: Tue Jul 7 14:34:20 2020 -0500
ISS-2047 - Add ids to user invitation or creation
commit a14522cbf32487dc590c8b6f3332d3fc9371a640
Merge: d08342170 cb94a86b1
Author: Jane Dow <jane.doe@example.com
Date: Tue Jul 7 14:28:28 2020 -0500
ISS-2032 - If an enterprise is disabled, their api keys should be disabled
commit d083421702e3ff50bc4c62e85b687e172e4bfe76
Merge: 0d167d25e ba3900a4a
Author: Joe Sample <joe.sample@example.com>
Date: Tue Jul 7 14:25:41 2020 -0500
ISS-2045 - Return a reason of why the invitation was auto-cancelled
Then if I want to see everything that happened in the branch that was merged in, I can just run:
and see all the commits that were in that feature branch. No need to squash things or hide them.
I really wish git, by default, showed commits under a merge as a tree, rather than as a flat list. This is what bazaar (bzr now breezy) did, and it made a lot more sense when looking at the history.
I don't get the "no merge commits" argument - git log has dozens of options allowing you to bend it to your use case.
To address the grand parent that "a clean history is has no merge commits" I would argue that a clean history is also a lie if you're working in a branching workflow (hint: you should be working in that most of the time).
If you want to avoid (note: not eliminate) merge commits then make sure to rebase against master before merging, and then ensure merges are fast-forward merges.
A 'merge commit' is the commit that ties together two strands.
* merge commit
|\
| * branch work
| |
| * branch work
|/
*
If you squash to merge, typically you're also going to rebase it (equivalently, if it's more familiar, cherry-pick the squash onto the branch your 'merging' it into).
* squash cherry-picked / rebased
| * both branch works squashed
| * | branch work
| | |
| * | branch work
|/_/
*
(In this case the target branch could have been fast-forwarded, but this also works if there's some other work on the mean time:)
* squash cherry-picked / rebased
|
* something unrelated
| * both branch works squashed
| * | branch work
| | |
| * | branch work
|/_/
*
I squash most feature PRs, but this is into a dev/qa branch initially. This is later merged into the actual release branch where we obviously want to maintain each of the individual commits.
Reason for the feature squash: most of the context of the original commit(s) are meaningful only to the original dev who is free to keep that history locally (or share with others). There are time's it's convenient to structure the commits in a certain way, but this is often most useful at PR review time, and less so after merge.
Granted, you may want to be able to re0review code at a later date, and hence keep that structure, but hopefully this is a rare case, and PRs and not often so huge.
Hi, Developer Evangelist at GitLab here. I would say, it depends on the workflow, unfortunately there is not right or wrong here. I'll try to share some situations of my development experience in the past years, also as Git/GitLab trainer:
## Squash
I often start in a feature branch with a new proof of concept, or other code which is persisted in several commits. Sometimes I'll iterate on a few things, like testing a change in a GitLab CI yaml, and checking whether it works. Or a different compiler flag to enable faster package builds. Or a refactored function which needs to run all the e2e tests to prove the performance gain.
These changes may, or may not work. When they do not work, I'll reset the commits - either soft to keep the changes, or hard to throw away the attempt. This follows a changed history and force push into the remote branch. In case of a shared branch, message colleagues with "git fetch && git reset --hard origin/branchname".
Within pair programming sessions, we often left with commits like "Add REST API HTTP server, WIP 2" in branches and depending on the availability, either one of us continued. At a certain point in development time, we decided to squash and amend the commits. Sometimes not all of them, as rebase/squash also allows you to do the following:
c1 s \
c2 s /
c3 p
c4 s \
c5 s /
Which squashes c1+c2, leaves c3, and squashes c4+c5 again. You can navigate into this on the CLI with "git rebase -i HEAD ~5".
## Rebase/Merge
Short-lived branches which are quickly merged back to the main branch shouldn't cause problems with a broken deployment. In case you get a task assigned where the main branch is far beyond (say, 100 commits or more), it may be the case that
- Branch and merge request works fine, CI/CD pipelines are green
- Changes in the main branch which affect your feature.
These changes can be
- Function interfaces renamed, or not existing. Easy to fix upon rebase, build/run does not work anymore.
- Runtime changes, for example, queries take longer roundtrip due to a refactor. Your feature only takes the old behaviour into account, and increases the runtime complexity. Or it consumes 10x memory resulting in OOM crashes later.
The last change may not be immediately visible, as it involves staging environments and application performance monitoring results.
### Merge without Rebase
If said changes occur, and the rebase did not happen, the green CI/CD MR is merged back to the main branch. Depending on the releases, you either roll into production, or after days/weeks/months, a new release is cut.
At that point, the regression may be seen in the main branch, and cause delayed analysis and debugging. Often times on-call alerts and all the debug fun which may lead to burnout (been there myself).
### Merge Request with Rebase
During the final review, and prior the merge, the changes are rebased against the latest base in the main branch, to see if they compile or any other influences.
A rebase puts the existing commits onto a new commit base, which influences the calculated checksums. Therefore all commits are newly generated, the author date is preserved with changing the commit date.
### Merge Commits
There are different opinions on them. One of them is to always rebase the MR and then do a merge with a commit. Rationale: Even without GitLab/GitHub/etc. you can reliably see the git graph on the CLI or with other visualization tools.
I've recently seen the possibility to reference a PR/MR to a commit as the merge-from-branch reference, without the dedicated merge commit. This can be handy to avoid it, with using GitLab/GitHub/etc. to store this detail in their database. It also is a vendor lock-in in a way, that the native "git clone" does not provide this information for you in Git's database.
That being said, I used to dislike merge commits. With enriched details, and CLI work, I now prefer them again. Git commits as datasource are valuable, and they can be shown/parsed in any environment.
### Rebase, Merge, ... large environments?
This can of course get more complex, with fast moving main branches and lots of merges which depend on each other, and should not reach the main branch. Instead, you'd want them to be queued and tested. We experience that at GitLab quite often, and have created so-called "Merge Trains" which ensure that all MRs in such a queue/train are taken into account: https://docs.gitlab.com/ee/ci/merge_request_pipelines/pipeli...
## A personal note: The best merge/rebase strategy is nothing without tests
I've been working on a monitoring tool in the past which includes distributed environments, and often needed to fix bugs with memory leaks or other performance issues in multi-threaded scenarios. Things you do not see immediately when the MR/PR is green. There was one commit which caused a OOM crash after 3 days of runtime, but only in cloud environments with >100 satellite nodes.
The turnaround was to bisect all the commits, and run each of them in production for 3 days until the crash occurred. IIRC we had 1,200 of them to do in a binary search.
Now the question is:
- Fewer squashed commits
- More development history
In this case, fewer commits would have unveiled the error sooner. The resulting commit would be larger and harder to debug & fix though.
In the end, it did not really matter. The thing which would have helped: There was no reliable test environment coupled to CI, CD which ensured to run specific commits & MR/PR in dedicated scenarios and alert of breaking changes soon enough - before the release happens.
## Conclusion
One thing which greatly helped: Looking how others do it, Open Source projects and customer success stories and webinars, online training sessions. Even though you may not adopt the workflows, trying them out is a good way to learn. For instance, "trunk based development with feature flags" is something different to well-known branching models for me, I needed to try them out first to change my opinion. They are indeed useful for certain scenarios.
While committing changes, and keeping them throughout un-squashed MRs, I always remember that I will be highly likely debugging the changes later on. Or someone who finds the MR reviewed by myself, documenting every thought or idea in a commit or MR comment can help.
Side comment: it's curious how the author went from "not my code", to "the problem was in the code".
I say this because as a low level/system developer, I often have to solve problems for the people integrating my platforms from higher level languages. When something doesn't work they come to me, and it often goes like this (with quotes from the article):
First stage: someone else's fault.
"the problem is environmental: a network topology issue, a bad or missing connection string"
Second stage: magical things, but not my code.
"Not my code ... the problem occurs somewhere in the framework"
Third stage: acceptance.
"Global variables are evil. Who knew? ... It took me hours to find the bug, and ten seconds to fix it."
The problematic commit is described "Extract CreateTokenValidationParameters method", without an explanation of why the refactoring is necessary or what problem is it solving. It looks like it is improving code readability, but it doesn't go as far as fixing the more glaring issue with global variables. Other commit messages seem to follow the same pattern.
In other words, the commit:
- has minor code readability improvements
- contains no useful message for the future programmer (why is it needed?)
- provides no new functional feature/improvement/business value
- is later found to contain a bug
I find squash/rebase/cherry-pick useful when reviewing my work and deciding what should go in the current pull-request. For example, a refactoring might be postponed for later if it is deemed too time-consuming or irrelevant to the current PR. Or, one can squash logically related commits together, add a useful message and merge them separately. The resulting commit log will still be bisect-able.
For beginners: there's a very good article with tips for Git Commit Messages[0] that helped me have git histories I enjoy reading.
[0] https://chris.beams.io/posts/git-commit/#why-not-how