Two Years of Squash Merge (2019)

waffletower · on April 30, 2021

Many developers naively sell `git squash` using a clarity argument. By squashing you lose historical information: there are times when the content of a merge requires a paper trail, times when individual commits can aid to separate the portions of a merge you would like to keep versus those you would like to rollback. Perhaps in a 10 times a day release regimen you decide never to look for such history. One size does not fit all, however.

I prefer a "have your cake and eat it too" approach. Keep the commit history. Use readily available tools to squash when performing analysis should you choose:

`git log -p --first-parent` (git 2.31+ ability available in early versions of git with different syntax)

tharkun__ · on May 1, 2021

I would argue that you only loose irrelevant information and you gain the ability to rollback. Without squashing, you are actually way worse off for a "10 times a day release regimen". We release every hour and we squash and rebase with a straight master history.

This enables us to almost mechanically just roll back to the previous commit that was out on Prod, should something happen and it's very easy to skip (revert) just one ticket and let the rest go out. No guessing, no manual figuring out which 7 commits belong to the ticket in question, potentially 6 of them had the ticket number in the commit message like they should, but the 7th, which coincidentally is actually commit number 3 in the sequence but interleaved with other ticket's commits the developer switched some numbers and instead of ticket ABC-123 he wrote ABC-132. Now we have a production incident and a completely garbled attempt to undo it.

We have none of those issues. Each ticket is one commit. If you revert one commit, you are guaranteed to have a working piece of software and you won't have a potentially not even compiling intermediate commit that was subsequently fixed up during a PR, which can happen without squashing.

gouggoug · on May 1, 2021

>No guessing, no manual figuring out which 7 commits belong to the ticket in question, potentially 6 of them had the ticket number in the commit message like they should, but the 7th, [...]

Lots of people it seems don't know this: _you can revert a merge commit_, which includes _all_ the commits that were part of the merge.

So, if you have a feature branch with 7 commits, you merge those 7 commits _with a merge commit_, if you need to rollback, you rollback _the merge commit_, which includes all 7 commits.

jka · on May 1, 2021

The fact that not many people know this unfortunately leads to a side-effect that it is less widely supported by third-party tooling around git.

In terms of simplicity (for example, 'git revert x' is natural-language-like and expresses the intent and meaning), and also in terms of tooling compatibility, that generally leads me to prefer squash commits.

One other thing that squash commits can enable is a sense of developer freedom within their branch(es). There's no need to keep the history super hygienic within each branch; there can be plenty of reverts, experimental commits, etc; the eventual merged product ends up looking clean regardless.

u801e · on May 1, 2021

But if only 1 of the 7 commits was actually the cause of the issue, then why would you want to revert all 7 of them instead of just the one?

tharkun__ · on May 1, 2021

If you have 7 commits it's very likely that 3 of those commits at least don't build because they were intermediary checkpoints. Then there's the one commit that contains the buggy code and if you revert that it breaks the whole thing and it doesn't work. Now you gotta revert the whole thing anyway, all 7 commits (or the merge).

Why bother with all this time and energy? If each single commit in your straight master history is one self contained thing it's easy to revert that whole thing and done.

We are also talking SaaS here with releases going out multiple times a day. So if something is very broken it's likely to be noticed very quickly and the breaking change is probably the last commit on master and you don't even have to revert it. You just deploy HEAD^ to Prod while you create a PR to fix the problem.

u801e · on May 1, 2021

One can ensure that every commit passes the tests suite and linting by running something like:

    git rebase --exec "test_cmd" --exec "lint_cmd" base_branch

and any commit that doesn't pass can be fixed until you get to the point where all commits in the branch pass.

Then you can find the commit that caused the issue, revert it and then make another commit that fixes the issue in the next PR (and ensure that the test suite passes for both commits with the --exec parameter to rebase.

> If each single commit in your straight master history is one self contained thing it's easy to revert that whole thing and done.

Not really. The more lines and files that are affected by applying the commit, the more likely there will be conflicts when trying to revert it. While that won't be the case if the commit (or merge commit) is the current head of the branch, but if other commits and merges have been made since then, then conflicts are fare more likely.

On the other hand, smaller commits are much easier to revert because they tend to not change many files or many lines of code.

> We are also talking SaaS here with releases going out multiple times a day. So if something is very broken it's likely to be noticed very quickly

That's assuming that it's "very" broken and it's noticed quickly. Those assumptions don't always hold.

tharkun__ · on May 1, 2021

I personally think that individual commits are mine and nobody should care. They help me. They're not meant to live forever and if you make me make all of them even compile let alone build green then I will squash before ever pushing. A red build should result from the first commit on a bug fix if you ask me.

I think we just fundamentally think differently. Your branches seem much longer lived. And each of your commits is more like the small commits that for us would sit directly on master.

Agreed, they do not always hold. But most of the time if something really bad happens it gets noticed and we roll back to the previously deployed commit. Stuff that is only noticed later is in most cases not such a big thing and is handled as a normal bug fix. Exceptions prove the rule obviously.

We also try to make individual tickets and PRs small, yes. It all plays well together and into each other. It has many advantages like easier PRs from the reviewers point of view. More likely for reviewers to actually find something useful to say as 'review fatigue' (aka 'I already read aaaaall this code and there's moooore? And teeeests? Waaaah! Whatever! ' and then they click approve) is less likely. Smaller tickets mean they get done faster and the pace that sets is more predictable than few large ones where execution time can vary considerably. Product people like predictability. You also deliver smaller chunks of changes so customers can give you more fine grained feedback vs just a large change that changes everything at once and they just go 'I hate change, undo this! Now!'. And should something really be of no use individually, feature flags are an option. Or a feature branch that doesn't live too long.

u801e · on May 2, 2021

> I personally think that individual commits are mine and nobody should care. They help me. They're not meant to live forever

But they don't really help when reviewing because they're hard to make sense of. But if you want to make a large change in a single commit, then why not just stage the entire thing as a single commit in the first place. You can address review comments by amending the commit and force pushing it.

By breaking it down into sensible commits, it makes it easier for the reviewer to review your change by filtering it by commit, so that they can see a subset of it.

> You also deliver smaller chunks of changes so customers can give you more fine grained feedback

Some of those changes involve some degree of refactoring. Typically, I'll separate the refactor part from the implementation part as separate commits. If I were to just make a PR that just did refactoring, then how would I justify merging it and deploying it to production?

Also, making a bunch of small changes and merging them separately takes away the association between them, unlike what you get by keeping multiple related commits in a branch.

tharkun__ · on May 2, 2021

I make individual commits for addressing review comments. It's sometimes helpful to see this during further review both for existing or new reviewers. They get squashed like any others afterwards. I don't see how it would be valuable after something is merged to master that I renamed a bunch of variables, extracted a method somewhere and added one test after a reviewer commented that I missed a case.

Refactorings are sometimes done with a separate PR and sometimes not. Depends on the size. From our discussion we might have different philosophies on when to do that. And yes at our company merging and deploying a refactoring to master is totally acceptable and done frequently. Even if the ticket takes longer in the end, gets scrapped etc. the refactoring can probably stand on its own and be valuable if we extracted it to its own PR.

Association should be there through tickets. We may have a different idea of what small is and what will be in such a commit. Let's have a made up example:

You are building something that has a table view for some data. That feature is out there. But the table doesn't have sorting capabilities and no filtering and no paging. It just displays everything.

There are individual tickets for adding each of these. Sorting is a small PR because the table component you use actually has that capability. You add the flag to enable sorting and the definition for the default sort and release it.

Filtering is another ticket. Filtering is harder. The API this is based on doesn't support filtering and isn't owned by your team. If this had been part of the same feature branch and released together customers would have nothing yet. Instead they have sorting already. You decide to do filtering via API and not in the UI after fetching everything. So off to create a ticket for the other team and talking to them. Maybe you can help out and do it for them but in any case this will take a while.

While waiting for their answer you move on to Paging! Yet another ticket. You notice that your table component doesn't support infinite scrolling. You want to have that though instead of individual paging. You create two PRs. One to the table component which adds infinite scrolling. This is an individual PR that is regression tested against all the other users of it in your code base. Once that's done you merge it. Why wait? Then you PR the actual change to use infinite scrolling with your specific table. This is at this point still based on that API that doesn't do filtering so you still always retrieve all data. But with lots of data it still helps rendering speed and the API is reasonably fast even with thousands of rows.

You create a new ticket to clean this up and make it work by retrieving paged data once the API also does filtering. Or maybe you never will.

u801e · on May 2, 2021

> I make individual commits for addressing review comments.

Those changes are visible when you make a force push in github since it generates a link that shows those changes.

> Association should be there through tickets.

I've seen companies change ticketing systems several times in career. Once it changes, all the old links and associations are as good as gone. But if that association is maintained via git, then that's not an issue.

tharkun__ · on May 2, 2021

I like how you first say that doing something in git itself is not needed because a random tool you use but I don't does things so that you can still see it easily and then you turn around and tell me that something is better to be visible in git history itself only because companies change tools.

Weird.

Tell me, in your career, how many times have you seen companies switch source control tools?

I have seen it many many times. In fact I've done some of these migrations for my companies in the past.

I've seen the same with ticketing systems as well. Yes you can loose history in both these transitions. I've worked with systems where either ticket or version history were not available past a certain point. Usually viewing many many years into the past was possible because people realized that historic information can be valuable. But there was a cutoff point that balances out ROI.

Unless required for regulatory purposes maybe, who really needs commit history from 15 years ago? It's cool don't get me wrong. I loved digging through commit history on code that originally was tracked via RCS on a (at the time) ~15 year old code base. And that was about 15 years ago. I'm getting old lol!

u801e · on May 2, 2021

> Tell me, in your career, how many times have you seen companies switch source control tools?

Twice. Once from CVS to SVN (where people started using SVN for new projects and left existing projects in CVS). And once from SVN to git where we used tools to get the SVN history into git.

> who really needs commit history from 15 years ago?

It really depends on how old the code is. I've worked on code that was last deployed close to a decade ago and some of the git blame output was showing commit dates from 2008 (not quite 15 years ago). Unfortunately, the commit messages left something to be desired and didn't really help in terms of figuring out what the issue was).

> I loved digging through commit history on code that originally was tracked via RCS on a (at the time) ~15 year old code base. And that was about 15 years ago. I'm getting old lol!

Same here. I never got to use RCS at work, though I learned about it in school :)

vikingcaffiene · on May 1, 2021

> Each ticket is one commit. If you revert one commit, you are guaranteed to have a working piece of software and you won't have a potentially not even compiling intermediate commit that was subsequently fixed up during a PR, which can happen without squashing.

This is the key point and I liked it so much that I bookmarked it. Hope you don’t mind. :)

I frequently have to debate with my teammates on the value of squash merging into our mainline branches. Another value add that squash merge brings is that it enables what I call “accidental documentation”. You get a easy to parse log of work that was done and a link right to the PR with a description and any discussion around it. As a lead I frequently need to coordinate teammates and communicate technical decisions. It’s invaluable to be able to just pull up the commit log and go right to the work in question without having to wade through a bunch of noise.

It also helps a ton with writing release notes.

bob1029 · on April 30, 2021

I file this one under: It Depends™

If your software development process requires that multiple people have a hand in each pull request, or these pull requests are part of a more complex merge graph, then I can clearly see the argument for NOT doing squash merge. This is plainly obvious to me and I would be on your side for not going down the squash path. Knowing who was responsible for each part is a very important thing.

If your software development process only ever has a single author per pull request, and these are only ever directed from work branch->master branch, then I would strongly argue for doing the squash merge option. This is what we do today, because we find it to be the ideal blend of hiding subjective commit styles while still preserving essential knowledge about who did what.

Occasionally, we will break our own rules (oops i didn't squash that one), but we don't make a big deal out of it. There are way more important things to worry about most of the time. There is only ever 1 specific commit hash you build your software at, so it doesn't really matter if the branch has 1 or 10,000 commits in it.

gouggoug · on April 30, 2021

I will always fight tooth and nail against squash merge.

Squash merge has the major disadvantage of getting rid of valuable meaningful git history.

Squash merge is not the proper solution for keeping your git history clean, it is a hack using the side effect of squash.

Keeping your git history clean is a matter of policy, best-practices and education:

Developers should be required to submit _clean_ PRs, that is, PR's whose git history has been organized and refactored in such a way that it removed "clean up commits", "typo fix", etc.

When you squash merge a feature branch that has thousands of lines of code, and 6 months later you have a bug introduced by this feature branch, it becomes extremely hard to find which line introduced the bug.

On the other hand, if you kept the history, and if this history was clean from the get go, it becomes easy to read the commits one-by-one and understand the issue.

Don't use squash+merge.

edit: I see numerous comment saying, in essence, "squash+merge" is what gets rid of the dirty history. No, developers must learn the existence of `git rebase --interactive`, which is the command to use to clean your git history[0]. "squash" is one possible action, among others, that helps cleaning the history.

[0]: https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History

calpaterson · on April 30, 2021

> Developers should be required to submit _clean_ PRs, that is, PR's whose git history has been organized and refactored in such a way that it removed "clean up commits", "typo fix", etc.

A complete and utter waste of time. You spend more time messing about with rebase than solving problems.

When you're digging through VCS history due to a bug you often ignore the commit message anyway - if the code did what it seemed to do you wouldn't be there.

sebastialonso · on April 30, 2021

I think I'm missing something here. How valuable is to have 20 commits of "fix this error" , "fix the fix of the error", "revert all fixes", "real fix".... etc? I'd argue a PR with many commits such as these, conveys little no no useful information, when the actual change is 1-3 LOC.

what about cleaning and filtering out useless commits, by soft reseting the branch and commiting just the actual changes to merge. Sure, it can take a lot of time, but it's done once, by you, after all you're supplying the changes. And if it takes a really long time, you're doing it wrong in the first place by submitting PRs with many LOC.

Have you tried rebasing branches with dozens of short useless commits? That's a real waste of time, and everyone needs to do this if you want to update your main branch. So you've now multiplied the amount of wasted time for everyone.

darknavi · on April 30, 2021

> what about cleaning and filtering out useless commits, by soft reseting the branch and commiting just the actual changes to merge.

So... squashing?

phone8675309 · on May 1, 2021

Yes, but the author of the code, who would presumably know their intent, gets to choose how many commits to squash to instead of just one. This means that they can preserve the development process in a logical way.

u801e · on May 1, 2021

What if the change really should be split into multiple commits. Having a mega commit affecting hundreds of lines across tens of files doesn't make reviewing the change easy, and reverting it will result in conflicts.

Seb-C · on May 1, 2021

I would ask such changes to be split into multiple independent PRs

u801e · on May 1, 2021

Then you essentially double the number of commits for a particular change, since it then becomes:

1. first related change

2. first merge commit

3. second related change

4. second merge commit

5. third related change

6. ...

Instead of just having all related changes in several commits with a merge commit at the end that groups those related changes.

Seb-C · on May 2, 2021

The number of commits does not matter. What matters is the quality of it. My opinion is that each PR should be an independent and working unit.

If you have a too-huge PR that can be subdivided, it almost always means you have multiple units of work (feature/fixes) into one branch/PR.

The reason I agree with the article and use squash is that it allows us to have a different history between master and the local branches. Developers can freely subdivide each feature into how many commits are useful to their mindset and workflow, while the master branch will remain clean.

In my experience, the individual scope of each commit on the master branch is almost always correlated to the scope of one code review, and thus one PR.

u801e · on May 2, 2021

Then there really is no need to even have merge commits. But, as far as I know, there no way to disable the creation of a merge commit when using the merge button on the github PR page.

The one thing you lose with just making one PR per commit is the relationship between a set of commits used to implement a feature. One commit involves some retractoring to make it easier to cleanly implement the feature. The other commit would be the feature and tests, and the last commit would be to add calls/references to that feature.

If these commits were kept in the same branch and merged in a single PR, then it would be easy to see the relation between them. If they were merged as 3 separate PRs, then it would be more difficult to see why the refactoring was done, especially if PRs for unrelated changes were merged in the interim.

calpaterson · on May 1, 2021

> How valuable is to have 20 commits of "fix this error" , "fix the fix of the error", "revert all fixes", "real fix".... etc? I'd argue a PR with many commits such as these, conveys little no no useful information, when the actual change is 1-3 LOC.

I'm not saying that I wouldn't broadly discourage the worst of that but most people do not commit every changed line so it is rarely a problem in practice.

What does become a problem are people who insist on "clean history", "re-order these commits", etc etc at PR stage when you both agree the functional problem is done/fixed and now the negotiation has moved on to your git history. I do contract work (read: move around a lot) and this social anti-pattern comes up very often. Usually one guy who's into it throwing his weight around.

Would it surprise you to learn there is a large crossover because people who do this and people who go forensic on other cosmetic stuff: whitespace, syntax, etc?! :)

rendaw · on May 1, 2021

I've used these commits to identify the correct fix for bugs before. An early commit changed something to do X, but did it wrong. A "more fixes" commit tried to fix it but made it do Y.

It was obvious that the later commit was wrong and I could fix it to do X. If it were squashed I would have thought the original intent was to do Y and spent way more time trying to figure out how to fix it.

Obviously it's not guaranteed but throwing away that information is a bad idea if it's just for some arbitrary aesthetics. Same goes for rebasing to clean up the commit history, which would ostensibly do the same.

pudmaidai · on May 1, 2021

It's not that those commits are in any way useful, it's that this "fix the error" commit might be 3 commits after the commit it belongs to. If there are no conflicts, `!fixup` will handle it for you; But since this is the real world, you're likely going to waste huge amounts of time solving conflicts that help no one.

My solution is:

- small-scoped PRs

- attempt to keep sub-commits readable, but don't waste time on them

Squashed commits also link to the PR where not only do you get the original commit list, but also get the whole discussion around the code.

pizza234 · on April 30, 2021

> A complete and utter waste of time. You spend more time messing about with rebase than solving problems.

I'm also one of those "micro committers".

Because of this attitude, many hundreds pedantic commit rebases after, I'm now a better programmer (and I also spend next to no time with that type of rebases).

This is because I have now a much higher capacity (and speed) of breaking down problems into smaller, self-contained, steps.

There is a common understanding that VCs are just storages. VCs actually map the mental model of the developer. Having a precise, clear, granular VC history has a bidirectional relationship with being a precise, granular, clear developer.

tablespoon · on April 30, 2021

> A complete and utter waste of time. You spend more time messing about with rebase than solving problems.

Eh. Not really. A clean history is a good resource for figuring out what was changed and why. Cleaning up history isn't even that much work anyway.

> When you're digging through VCS history due to a bug you often ignore the commit message anyway - if the code did what it seemed to do you wouldn't be there.

Only if the commit messages are shitty, which unfortunately they often are. I try to summarize what I'm changing and what I indent those changes to accomplish, because if there's one thing I hate doing, it's spending more time than necessary to puzzle out what I (or a coworker) was thinking at the time.

VCS history can be a pretty powerful tool, but a messy history discourages the use of it.

hbogert · on May 5, 2021

amen!

really, do people even use git bisect? If you do, you go through some rite of passage and I've seen people including myself make better (not perfect) commits. It remains a black art to gauge what the perfect amount of work in a commit should be.

gregmac · on April 30, 2021

> A complete and utter waste of time. You spend more time messing about with rebase than solving problems.

Unless you're using a garbage client (eg, the git CLI) a rebase to get rid of the "typo" "oops" type commits (when you forgot something) takes I'd say 10-20 seconds.

Most commonly I do this when I have several changes on the go at once and forget to commit a fixed unit test, new import, or something like that. I usually realize it right away, so my commit will usually be "add test" or "amend 2 commits ago" or something to remind myself which one it goes to (and it sticks out, because my other commits are usually in the format "PROJ-1234: Add new --host command-line option").

> When you're digging through VCS history due to a bug you often ignore the commit message anyway - if the code did what it seemed to do you wouldn't be there.

I presume this is doing a bisect or something to narrow down where a bug was introduced?

My experience comes from the opposite side. Usually I identify the line of code causing a bug, it makes me go "wtf, what is this even supposed to be doing?" so I do a git blame. Hopefully the commit message is useful and ideally leads me back to the original bug/ticket, so I can figure out what the original intent/fix was, rather than inadvertently breaking something (or re-causing a different bug it fixed). I'll also note that in code that is well-commented and well-tested, this step is rarely necessary.

As an example, say I come across something like this:

    if (port > 443) ignoreErrors = true;

Clearly, this was added to solve some some problem, but even without any context it's pretty obviously a bad solution to whatever that problem is. Git blame lets me go back to see what the original reason/bug was, evaluate if it's still relevant and then either fix it properly or safely remove this line.

tharkun__ · on April 30, 2021

Whoa whoa, hold your horses. I have used nothing _but_ the git CLI for all my commits, rebases and squashes for the better part of 10 years now.

I do agree with you though that rebasing and squashing are absolutely easy and awesome and whoever doesn't agree probably comes at it from a badly run repository with utterly bad practices and it would probably _benefit_ from a proper squashing and rebasing practice.

FalconSensei · on April 30, 2021

> My experience comes from the opposite side. Usually I identify the line of code causing a bug, it makes me go "wtf, what is this even supposed to be doing?" so I do a git blame.

If the commit message is something like: PROJECT-TICKETNUMBER Description, you can you know... see the ticket/issue along with full details and discussions?

codesnik · on April 30, 2021

git CLI isn't so bad for that, why? I do --fixup commits all the time, git commit -i gives me all the power to reorder, merge and even split commits.

Quekid5 · on May 1, 2021

While the typical basic rebase isn't so bad from the CLI, something like magit is faster, and much much faster for anything apart from the basic rebases.

hashhar · on May 1, 2021

Fixup with autosquash is so damn convinient. I wonder if the people complaining have ever used them.

hashhar · on May 1, 2021

Have people not seen autosquash when rebasing? And the --fixup and --squash options to commit?

Check those out. It makes the rebase automated with correct commits getting fixed up or squashed as you desire.

sethherr · on April 30, 2021

> I'll also note that in code that is well-commented and well-tested, this step is rarely necessary.

This is the most important line from this comment.

If I'm using blame or bisect, it means I did something wrong.

I merge 10 times a day - and if it takes me a minute, because I'm slower than you - that means it's ten minutes a day. Add the cognitive load of the whole process and the added time of explaining it to other team members - 10 minutes is an underestimate.

I'm going to spend that time writing tests and comments, since even you note that they're the better option.

Squash is good enough. I spend my energy improving the right things.

throwaway1777 · on April 30, 2021

Strongly disagree. Mainline branch commit history should be clean atomic commits that you can roll back. Otherwise you are just asking for trouble on any merge conflict or hotfix scenario.

tharkun__ · on April 30, 2021

You have apparently never had the benefit of a properly run git repository.

Our master has straight history. We always rebase to merge a PR and we have build scripts that check this as well. You will not be able to merge a PR that tries to get more than one commit onto master, if the commits touch more than one of the modules (we have a monorepo with lots of individual services). Until the PR has been approved we leave individual commits on the branch though as that can be helpful.

I have not had to "mess" with rebases ever. Most rebases apply cleanly and if they don't the conflict resolution is mostly very easy. The 'hardest' conflicts to solve were actually in the end the easiest to solve, because all you had to do was to `git rebase --skip` the appropriate commits. Of course some developers that didn't know better spent hours trying to do those conflict resolutions and complained loudly. Fortunately they were that loud, so we caught it and just did the skipping. Everything applied cleanly without any conflicts after that.

It's pretty awesome to have one commit per ticket on master and especially if something goes wrong. I can easily `git bisect` to find which ticket introduced a bug or a regression and I don't ever have to mess around. All commits have to have a ticket number in the commit message.

It also enables firefighters to almost mechanically solve production issues. Prod is broken after the last production deploy? There's very very likely only one new commit since the last deploy before that (or maybe a few). In any case, just revert to the commit of the previous deploy, lock down the deployments and now you have time to properly solve the issue without time pressure. If there was more than one commit and you can pinpoint which of them caused the issue it would even be very easy to just revert that one commit and let deployments continue. No dealing with 27 commits that make up one ticket.

Know thy tools!

ThrustVectoring · on May 1, 2021

> if the commits touch more than one of the modules (we have a monorepo with lots of individual services)

That seems wrong to me - a big advantage to having a monorepo is that when you need to make a change to how two services work together, you can do so with a single atomic commit.

tharkun__ · on May 1, 2021

I may not have been clear. You can merge a single commit to master (as a fast forward rebase) regardless of how many modules it touches. But you can't do it with more than one commit. It will refuse this because we don't want non squashed stuff on master and it was an easy enough hack to prevent that. Yes it's not fool proof but works well enough so far. It discourages feature branches too which we also want. All these scripts are part of the monorepo and if I need a feature branch for something I can easily exclude that branch in these checks. I've only done that once in the last few years.

This stuff is also very fluid and changed and improved as we go. If the easy solution turns out to not work well enough for enough common cases _then_ we spend more time making it better. Otherwise 'good enough' does the trick for us. YMMV depending on your company size and developer culture. E.g. you might have too many cowboys that just add exceptions all the time or remove the checks entirely. Nobody can help you there. Fire then or flee ;)

da39a3ee · on May 1, 2021

When a bug is found in production, it should usually be fixed by:

1. Make a commit supplying the test coverage that was evidently lacking.

2. Make a commit supplying the fix.

Everything you say is valid. However, what I just described is the simplest and clearest example of valuable multi-commit-per-ticket git history.

tharkun__ · on May 1, 2021

I myself do and encourage my guys to do multiple commits to checkpoint their work. Also for bug fixes as you describe, first commit is the test proving its broken. This should turn the build red. Then another commit with the fix which makes the build green. All of this on the branch. PR it like that. It's valuable to see this history.

Once the PR is approved and before merging to master, rebase abd squash. The build failing intermediate commit is no longer valuable.

da39a3ee · on May 1, 2021

> The build failing intermediate commit is no longer valuable.

At times what you say will be correct, but it's not at all obvious that what you say is always correct. Suppose that subsequently the authors of the fix in question have a horrible realization that there is something wrong with their fix, and furthermore they are panicking. Now, as a helpful colleague introducing some calm to the situation, wouldn't it be reasonable to recommend that they go back to the state of the codebase where (a) the original bug is present, and (b) it is demonstrated by a test?

But with your squash, no such state exists on main[0]. To my mind it feels a bit janky to have to check out some non-master commit from a feature branch to go to that codebase state.

--

[0] I am not an ideologue but I do kind of like main over master as it is shorter and clearer.

tharkun__ · on May 1, 2021

I usually go with what makes sense in most situations instead of complicating my life with stuff that may, perhaps, one day, if the moon is full and the stars are misaligned not work perfectly.

That said, I do not need to have an individual commit to do what you are saying. In fact our default is to remove all branches after the merge to master. But each fix is just one commit anyway. So the state you describe is actually just the commit on master that is before the fix. No need for finding old branches and finding the commit that has the test only.

The thing with the tests is also that this is an ideal scenario. Not everyone does this. I don't always do it either or can't do it. But I strive to. So relying on it would be bad. To do what you said regardless of whether there was a first commit with a test or not I simply need that one squashed commit, which is very likely very small and create a patch (simple diff of that commit and the previous commit meaning) and then remove all changes that aren't tests and apply this to the pre-fix commit. This is very likely very easy because tests are in separate folders and it's a small patch file. Probably takes less than a minute to do. Will not always work 100% but good enough in many cases and I can count on one hand how often this would have been needed in my entire career.

(in fact I did this for a similar thing recently. I needed to apply some of the code from a spike. I created the patch, deleted unwanted stuff from the patch file and then applied it. Easiest ticket I did in a while ;))

Basically: 'git diff <prevCommitHash> > abcd.patch' then 'git checkout <prevCommitHash>'. 'vi abcd. patch' aaand 'patch -p1 < abcd.patch'

Re: Master. That's the name of the branch as per git default. Not much I can do there ;)

david_allison · on May 1, 2021

There's negative value in having a commit in the history which turns the build red; it should be removed, but the requirement that the branch is squashed onto master doesn't follow.

It'd be a perfectly valid alternative in some cases to clean the commit history and rebase a number of atomic commits onto master if it would make the history more readable.

da39a3ee · on May 1, 2021

Suppose

(a) You suddenly realize that the production code in the main (not deployed) branch is incorrect

(b) There is no test coverage for the bug

(c) You have time to commit a test reproducing the bug, but a fix will take days.

In this situation you think there is negative value in committing a failing test that will make the build go red? Perhaps you were forgetting that sometimes commits alter production code paths and sometimes they alter test code?

david_allison · on May 1, 2021

If you turn the main branch red, you affect the workflows of all developers who pull until you turn the build green again. It may also affect git bisect depending on the bisect workflow used.

The ideal would be to revert the change if known, or disable deployment on the branch until the issue is fixed with an ignored test*. I feel a workflow where you explicitly break the build [incl. tests] when a bug is discovered isn't ideal.

tharkun__ · on May 1, 2021

I agree. This is not how I meant it (OP here again).

Our master is alway kept green. If master is not green we are dead in the water. If something happens and master is not green we can not deploy any fixes.

As you say we either revert the breaking commit and push that out fast or we deploy the last known good commit. This may just be HEAD^ if there was only one new commit going out. It might also be several commits ago. Depends on what got merged recently and made it into that deploy.

On bisecting: we keep the actual images for some time. So if you don't need to bisect something rather old, then you don't need to build anything to bisect. Only deploy and check whether you had a good or bad commit. A deploy takes less than a minute usually vs a ~10 minute build.

u801e · on May 1, 2021

> You will not be able to merge a PR that tries to get more than one commit onto master

What value does the merge commit in the PR provide if a PR only has a single commit? Couldn't you essentially halve the number of commits in the commit history by amending the single PR commit message to contain the text that the merge commit will contain anyway?

tharkun__ · on May 1, 2021

In case it wasn't clear from my post, I am very all over the place with the word 'merge'. I usually use it to describe the concept. We don't ever have actual merge commits. We merge by rebasing and that then makes the 'merge' just a fast forward.

mrkeen · on April 30, 2021

> You spend more time messing about with rebase than solving problems

Rebase should be quick and straightforward. If it's not, you've got bigger problems.

By rebasing, you're taking responsibility for changing the codebase as it exists now, rather than how it used to be.

thom · on April 30, 2021

Do people here have examples of some bugs for which they had to resort to VCS history to find the cause? I'm struggling to picture a single bug in my whole career where this would have been quicker than just following the logic of the code. If there's information in commit messages that isn't evident in the code itself, that seems a terrible way to live.

NateEag · on May 1, 2021

What follows is a summary of a time bisect saved a team I was on from a lot of pain. It's been a long time so I may have some details wrong, but the point should be clear.

Years ago I worked at an e-commerce shop.

One fine day we discovered that the checkout process had been broken in a very particular corner case at some point in the past several months.

No one had any idea how it had happened.

Most of the team was trying to work out what could have possibly gone wrong by staring intently at code in their IDEs and debuggers while stepping through reproductions.

I thought, "everyone's already doing that - I'll try a bisect, since we have a reliable reproduction for this inscrutable weirdness."

It took maybe twenty minutes before I could point to a < twenty-line change.

The author of said change was flummoxed at the breakage, as it was nowhere near the actual checkout code, but a few minutes of the whole team staring at it revealed a classic piece of PHP spaghetti insanity that had managed to break the checkout process (I no longer recall the specific issue - might've been global variable name collision).

I think it would have taken hours before we got to an answer without bisect.

Granted, it only worked so well because of good commit hygiene (which I had trained everyone to use when I introduced git and code review to that team), but it showed quite clearly how useful good commit hygiene is when paired with bisect.

detaro · on April 30, 2021

As parent said, history is not just commit messages.

Trivial example from yesterday: "this worked last week... what did they even touch inbetween?", look at the most relevant commit diff, context makes it obvious someone just accidentally deleted a line too many while replacing a code block.

For more complex stuff, it's helpful for "what were they trying to do with this line", "what's the requirements document they worked off when they wrote this", "which other versions are likely to have this bug too", "Is this new and I can go ask whoever wrote this about it or is it years old", ...

david_allison · on May 1, 2021

"had to" is a stretch, but as the first example from GitHub: https://github.com/ankidroid/Anki-Android/issues/7781

There's a lot of value in bisecting to find the exact commit, moving from "this is probably the commit that broke it" to "this is exactly the commit which broke it"

Typical workflow:

* branch off master

* Write a test which passes at some point in the past, and fails on HEAD

* rebase the test into the branch at the point in the past, so bisect works

* git bisect to pinpoint the bug

With a non-squashed and a history where each commit builds, `git bisect` points to the exact failing commit and makes the investigation trivial

An additional advantage is that you're not wasting time investigating: the unit test can be used as a regression test

u801e · on May 1, 2021

> Do people here have examples of some bugs for which they had to resort to VCS history to find the cause?

A more common scenario is to run git blame on the files you plan to modify to see what commits were responsible for them. If the message is detailed enough, you can avoid introducing a regression because you altered logic introduced by a commit to fix a bug.

> If there's information in commit messages that isn't evident in the code itself, that seems a terrible way to live.

Good commit messages include why a change was made, which usually isn't really reflected in the code itself. Comments may be outdated if the code is changed without updating the comments, but the git blame output for a line of code is timeless because the associated between that commit and line of code isn't lost until the line itself is updated.

Noumenon72 · on May 1, 2021

* Commit that added line was fixing something unrelated; deduced line was duplicated in merge resolution. * Line was comparing two different enums for equality and failing; commit history showed one enum had been an int and the commit message that changed it explained why the author thought the comparison was safe anyway, so we could ask why that wasn't true

Plus even when the _cause_ is in the code, I constantly check "what's the history of this function?" while trying to figure out whether the code was always intended to do this crazy thing or whether it's been morphed by recent changes. I have "Show selection history" mapped to Ctrl+Shift+G+H in PyCharm.

pizza234 · on April 30, 2021

As wrote in another comment, bisecting (which is for me a significant tool) relies on history (specifically, a granular one). However, it also must a disciplined history.

calpaterson · on May 1, 2021

For example when a bug has messed up data and you want to know when it was introduced. Then you know how far you have to go back to correct it.

Other times knowing how it was introduced can give you a sense of what the change the original author was trying to make.

thom · on April 30, 2021

What's an example of a bug you've had to bisect for recently? Forgive me, it just seems like such a last resort thing.

LocalPCGuy · on April 30, 2021

I don't bisect regularly, but when I do, it's usually trying to figure out what the original reason for introducing the code that is problematic is. The entire point is on projects big enough, you may not be able to just "follow the logic" enough to know that your fix the to apparent bug isn't re-introducing some regression that was fixed previously.

So bisect to me, is a way to figure out where and why the code was changed the way that it was, so I have a better understanding of the changes I can make going forward.

That doesn't mean the change made in the past was correct and must be maintained (obviously something is broken), or that it might be that the obvious solution based on following the code logic is correct. But that doesn't mean I wasted time making sure I best understand the reasoning behind the changes made. But this is also why I don't resort to it very often, because it isn't necessary in all cases (I'd even say it isn't necessary in most cases).

Bisect also allows you to see other changes made in the commit in question, and around that commit, so you get a better overall picture of the logic.

thom · on May 1, 2021

If you've got a codebase where the intent of the code isn't clear, where you can't track bugs, and where you can't fix them without being confident you're not breaking other stuff, surely those are all fundamental problems worth addressing?

JoshuaDavid · on May 1, 2021

Yes, but those things take time to address and in the meantime the need to make changes that address more immediate needs doesn't go away. Bisect specifically helps if you've got an easily reproducible bug in a hairy part of the codebase and you know that bug could be arbitrarily old but is on a path that recently started getting exercised much more often. I use it maybe once a month but when I do it is very nice to have that context of how the bug happened and any other places that might need to be fixed.

thom · on May 1, 2021

Sure, but at that point there's a whole raft of practices that seem more important than 'having a nice commit history'.

JoshuaDavid · on May 1, 2021

Bisect doesn't require a nice commit history, just a granular one (well, it doesn't require a granular one either but it's a bit less helpful if it lets you identify the problem commit as the +18000/-7000 LOC "Overhaul payment system <7 paragraph description of new payment system>" than if it drops you on the +3/-3 LOC "fix encoding bug <no description of what the bug is or what the fix does>")

Edit: I should also add that if you use github, squash-merging pull requests is fine because you can pull down and check out the pull request branch and run bisect on that. Or more generally if you don't delete history and keep a record of what commits were squashed to make your giant commit, your friendly local maintenance programmer might grumble slightly but will still be able to work effectively. Just please avoid erasing history entirely.

tharkun__ · on April 30, 2021

I can give you an example though I have to be sort of vague for obvious reasons too. In fact, I have been bisecting to find the cause of a bug 3 times in the past 3 to 4 weeks.

Customer Service reported a problem with something that my team is responsible for. I knew almost for certain that we didn't break it. I had a vague idea that another team might have broken it by a recent-ish change to a different service that we both rely on but aren't the maintainers of (we have a monorepo). In fact they made multiple changes to it recently (they put me on the PR for awareness).

I don't know the code for that service but it was really easy to check for the occurrence of the bug (literally two clicks in the UI to reproduce or not reproduce). Instead of trying to understand a service that I don't own, which would require mental concentration, I randomly checked out a commit from 4 weeks ago, tested => bug not reproducible and started a bisect. I was actually gonna be stuck in some meetings for the next few hours but bisecting is mechanical, so I was able to just do it on the side (deploying is a little bit of waiting in between, you gotta refresh etc.) and a little later I was able to just paste a very neutral "XYZ is the first bad commit" into the ticket. Totally cuts down on the 'drama' too as you're not simply 'accusing' someone else of causing an issue in _your_ part of the application. It's just there, very neutral, there's the commit that caused it. Git told me!

thom · on May 1, 2021

But that just sounds like an _incredible_ amount of effort and even at the end you still don't actually know the cause of the bug. Surely you just raise it to someone who owns the code in question? It sounds like you know who they are already?

tharkun__ · on May 1, 2021

Until you have done it it may sound like an incredible amount of effort but in fact it isn't at all. Your reality might be different, but we also deal with different time zones. In this particular example the other team was in a different time zone and already gone for the day. The easiest and most efficient way was a 'dumb' bisect.

If you _only_ raise it to the team in question, you have nothing in hand but suspicions. Their natural reaction is most probably going to be "No can't be us, must be something/someone else" and you've wasted an entire day waiting for their response that was predictable anyway and you've also just increased both their and your own irritation level. Their's because they probably feel like you're trying to blame them for something they feel is not their fault. Your own because you've just wasted a day trying to solve this and meanwhile customer service thinks you caused a regression.

A bisect that literally took zero brain cycles and told me which commit caused it before I was even halfway done with my meetings allowed me to let them know exactly which commit caused it without a doubt (unless your company is so broken that they wouldn't even believe the bisect output :)). This also gives them very valuable information because instead of them now trying to have to figure out which of their changes maybe caused this thing I'm 'accusing' them of, they have a single commit, which is directly tied to a single ticket. In this particular case, the developer in question actually knew right away from seeing the commit message/ticket number what particular part of their change was the cause of the issue, without even having to dig through the code.

thom · on May 1, 2021

If an issue isn't urgent and you aren't going to fix it, why spend time on it? Why not send it to the team that you already know is responsible? Why do you have teams that refuse to investigate their own issues and instead have weird emotional reactions to bug reports? Why are they more likely to accept a git bisect that took zero brain cycles from you rather than do it themselves in zero brain cycles? This all just sounds completely dysfunctional. Flee!

tharkun__ · on May 1, 2021

As my sibling already points out correctly, no need to flee, not completely dysfunctional at all and very very common in both functional, semi-functional and yes also dysfunctional companies.

At the point where I have _not_ done the bisect, I in fact do not actually _know_ who is responsible. I have a suspicion, a hunch, an assumption. Assumptions are bad and need to be validated. I can either let the other team that I suspect validate it. If it turns out that I was wrong, I have both wasted their and my own time and reduced my credibility. What then? I try to suspect another team and try the same thing again? That's what happens in a lot of large companies all the time. Nobody looks into anything and just tries to pawn it off to someone else. If someone tried this more than a couple of times with me, I definitely know what I would do the third time around.

They are more likely to accept a bisect from me, because it shows that I am not simply trying to pawn something off to them based on a mere assumption. It shows that I care and that I validate my assumptions and don't just try to pawn things off to other teams and play 'hot potato'. That would indeed be dysfunctional.

jaredsohn · on May 1, 2021

Not the GP, but this kind of thing can happen all of the time.

The people experiencing the bug don't know who is responsible so they arbitrarily pick someone. The bug could get passed back and forth endlessly between teams until somebody isolates it a little better and bisecting is a good tool for doing that. Just doing the bisecting rather than arguing with the other team about who should do the bisecting is a way to get things done. If it does turn out to be the other team, it may be easier to convince the other team to take on similar requests in the future.

detaro · on May 1, 2021

What's so incredible about spending a bit of time parallel to other things on identifying a bug?

pizza234 · on May 4, 2021

To keep in mind that in cases where a bug can be programmatically reproducible (typically, existing or "upcoming" automated test coverage), bisect can be automated trivially.

ThrustVectoring · on May 1, 2021

I've used version control history to verify that a mysterious bug was not caused by any changes in the codebase. Instead, we were set up to automatically incorporate minor version changes in dependencies, one of which improperly had a breaking change.

This class of bug is impossible to find by following the logic of the code because it's not in the code.

Rapzid · on April 30, 2021

I agree with you in part, depending on the project, if that means just squashing it down first. This at least can keep the integration branch somewhat sane and without a bunch of intermediate commits that are junk and won't build.

For this reason I prefer people to rebase on the integration branch instead of merging it in to stay current; unless there is a reason not to. And a lot of open source projects require PRs that to be pre-squashed or be able to be squashed before merging!

However, there is a time and place still for deliberately crafting the PR commits. For instance if you are wanting the reviewer to be able to review chunks of the PR in isolation/stages, or if you want each of the commits to be buildable.. I suppose also for review/testing.

This all depends on the project and other factors like the wider org.

u801e · on May 1, 2021

> A complete and utter waste of time. You spend more time messing about with rebase than solving problems.

That can be avoided by only staging and committing changes when the change is actually completely implemented. For example, I'll make a change and then I'll stage it as follows:

1. Stage and commit a new method and associated unit tests

2. Stage and commit calls to the new method

3. Stage and commit the version update

Having to go through extensive rebasing is really an issue with treating version control as a back up system rather than a way to record individual units of work with well crafted commit messages that describe what and why the change was made.

qudat · on April 30, 2021

Agreed but I also think this depends heavily on context. If the organization cares a lot of clean commit messages, have a nice git history, and want PRs to be submitted in a clean state, great!

This is not always the case. I've worked on codebases where we rarely scan the git history messages for anything useful, it's just as easy for us to dig into the code to figure out what's wrong.

dyeje · on April 30, 2021

It's a prime example of engineers mistaking aesthetics for utility.

globular-toast · on April 30, 2021

> A complete and utter waste of time. You spend more time messing about with rebase than solving problems.

You won't think that when customers report a regression that was introduced some time in the last year.

preordained · on April 30, 2021

Could not agree more...I wish I had more of substance to add, but an upvote didn't feel sufficient

ginja · on April 30, 2021

I do agree that PRs should have a clean history. However:

- Services like GitHub allow you to restore the original branch, so you never actually lose history. So I don't see any major drawbacks of squashing on merge.

- If your PRs are several thousand lines long, they probably should've been broken up into multiple PRs (your reviewer will appreciate it)

finnh · on April 30, 2021

Spot-on for both counts. Preserving a PR's individual commits in the master branch is insane - many of those commits won't represent a fully working system anyway, so why keep them in master?

gouggoug · on May 1, 2021

It's entirely not insane _on the premise that the history is clean_. That's the whole argument that is being made here.

If you think that "Preserving a PR's individual commits in the master branch is insane - many of those commits won't represent a fully working system anyway", your issue _is_ "that your commits don't represent a fully working system anyway", _that_ is what you need to fix.

You need to make sure your history _does_ represent multiple small digestible atomic chunk of codes, that are understandable individually.

By squashing, you're not fixing the root cause, you're treating the symptoms.

sjoruk · on April 30, 2021

> many of those commits won't represent a fully working system anyway

This is entirely within the control of the developer(s) working on the project. Whether it's worth it is up to the people working on it. It's certainly not insane though - it's much easier to fix merge/rebase conflicts when each commit is small and easy to reason about.

finnh · on April 30, 2021

> it's much easier to fix merge/rebase conflicts when each commit is small and easy to reason about.

I agree, but I'd say it in the context of PRs. It's much easier to fix issues (and avoid them) when _PRs_ are small & easy to reason about.

pizza234 · on April 30, 2021

> - Services like GitHub allow you to restore the original branch, so you never actually lose history. So I don't see any major drawbacks of squashing on merge.

There's definitely a drawback, and it's bisecting, which is actually a big deal.

Bisecting allows in a semi-automated (depending on the issue) bisecting what otherwise can be a large diff.

But of course, it requires a disciplined history - otherwise, bisecting just won't work.

marcinzm · on April 30, 2021

Can't you just bisect twice. Once to find the PR and a second time within the PR's branch?

pizza234 · on May 2, 2021

AFAIK, if you squash merge, the SCM (intended as the repository, both local and remote) will keep only a single commit; the commits breakdown will be in GitHub's (intended as a service) history.

gouggoug · on May 1, 2021

So, if your PR has a clean history in the first place (clean as in multiple small digestible, atomic commits, that are easy to review in isolation), why get rid of it with a squash?

Is there a fundamental drawback to keeping that clean history? _It's already here_

scott0129 · on April 30, 2021

I would agree with your points if not for the fact that "clean up commits" or "typo fix" is a necessary result of PR's.

Your teammate will request changes in your code, and the only way to cleanly communicate "yes I made that change, and ONLY that change" is through these clean-up commits.

Otherwise, if you amend/force-push or open an entirely new PR, 99% of the diff are things that your team has already seen and reviewed.

Squash merges let you clearly communicate how you addressed PR comments, while also keeping the master history clean.

leafmeal · on April 30, 2021

Fixup commits are a good way to work around this. https://git-scm.com/docs/git-commit#Documentation/git-commit...

You still create your "clean up" commits, but it's done in a way where git can automatically squash everything back together in the end.

See also https://git-scm.com/docs/git-rebase#Documentation/git-rebase...

maxioatic · on April 30, 2021

Huh, never knew about this. Seems quite useful. Thanks!

jcelerier · on April 30, 2021

> Otherwise, if you amend/force-push or open an entirely new PR, 99% of the diff are things that your team has already seen and reviewed.

gerrit has solved this issue for years by showing the diffs between each successive revision of a patch.

e.g. look here the files at different origin patchsets : https://codereview.qt-project.org/c/qt/qtwayland/+/321246/3....

jonhohle · on April 30, 2021

This was a feature of ReviewBoard as well. The history of code review changes was maintained by the tool separately from commit history.

pjc50 · on April 30, 2021

It took me a while to get a workflow that actually works with Gerrit, and I occasionally think of trying to do some kind of autosquash so I can have successive commits locally. In practice I just use --amend all the time.

Careful use of setting upstream and of course pull=rebase makes keeping up with trunk manageable.

piotrkaminski · on May 1, 2021

And as it happens, Reviewable also deals with this perfectly fine. In fact, it appears that most any code review tool save GitHub is perfectly all right with amend/force-push...

servilio · on May 1, 2021

You can also use git-range-diff[1].

[1] http://git-scm.com/docs/git-range-diff

jcelerier · on May 1, 2021

how does that work if the old version of the amended commit has been gc'd ?

mrkeen · on April 30, 2021

I don't mind this. I want my reviewers to look at the state of the new code - not the diff, and not the diff of the diff.

Address PR comments using the PR comments.

Cymen · on April 30, 2021

I am in agreement however I suspect many developers have never seen the usage of "git bisect" to track down a bug and fix it. Once you see the power of that, I think one can come to appreciate more the granular git history that is present when not using squash commits.

Of course, git bisect still works with squash commits it just makes your job as the bug fixer much harder because typically squashed commits are quite large so you have to figure out what in the N lines of code introduced the defect.

tharkun__ · on April 30, 2021

I would beg to differ. How large your commits are, depends on your tickets. As I've just explained in another reply, squash merging with rebases does not have any direct correlation with how large your commits are or whether you can use long lived feature branches.

We do sometimes use feature branches for example (very sparingly) but then that feature branch becomes a temporary 'master' that we squash merge to and the feature branch does _not_ get squashed but the individual commits suddenly all appear on master after rebasing, which results in a nice fast-forward 'merge'. Those feature branches are the result of multiple tickets which all have their own individual commits and which could all have made directly against master using the squash merge strategy, except there were 'reasons' not to (which I usually try to dispel and work with master directly but it's not always possible).

asimpletune · on April 30, 2021

Hey, so this POV comes up a lot, and I have to say that I think it mistakes how git commits should work, but I’ll add that you can sort of have both.

First, in a production branch, git commits should be thought of as functions. Like “Apply commit X, get feature Y, unapply it and you get the reverse”. So the problem with preserving full git history in master is that it breaks that invariant. You have to sort of do like a range of commits, but then that doesn’t really always work because often times other commits can be interleaved into yours.

To understand how I mean, just look at Linux or open source software projects, where the technical experts are more or less gatekeepers and aren’t accountable to any other influence. You’ll find that commits work this way, and their history is squashed. (Maintainers will also force you to rebase before merging and basically put all the work on you to get the PR in ship shape, which is a lot different from a corporate environment)

Ok, but then to your point about preserving valuable, more granular commits, well, the solution is you just leave that remote branch up, either in your fork or elsewhere (but probably in your fork). These branches should have formal names identifying a ticket (at work) or an RFC or whatever, so it should be easy for people to discover what happened. They can see this remote issued a PR to this remote to implement RFC-123 or whatever, and they can go to that remote and see the granular commits if you preserved.

Sorry, this is like a never ending debate and there are very strong opinions, but I sincerely think that in this situation there is actually a right answer. People who already know how to do this or are super familiar with open source, maintainers or team leads or whatever, I think don’t see the point in fighting over it, since they can just enforce whatever they want in their gatekeeper role. However, the truth is that there’s sort of a fundamental misunderstanding that the majority of the population has around git and I think it’s better to engage and explain when you get the chance.

gouggoug · on May 1, 2021

> First, in a production branch, git commits should be thought of as functions. Like “Apply commit X, get feature Y, unapply it and you get the reverse”. So the problem with preserving full git history in master is that it breaks that invariant. You have to sort of do like a range of commits, but then that doesn’t really always work because often times other commits can be interleaved into yours.

Merge commits allow you to do exactly that. If you have a branch with, say 15 commits, you can merge that branch _with a merge commit_, preserving all 15 commits, but if you need to revert, you revert the merge, which automatically reverts all 15 commits.

> Ok, but then to your point about preserving valuable, more granular commits, well, the solution is you just leave that remote branch up

So you went through all the trouble of making your git history something clean and valuable. And now, instead of merging it as-is with a merge commit into main, you squash the entire history (that you literally just spent time making useful) into main.

Now, because you think you might need this history (that you spent time making pretty and valuable), you come up with a whole process of creating a remote branch, giving it a name that follows specific conventions and whatnot, you also make the decision that you'll keep these branches forever, and, you need a way to link that remote branch to the squashed history.

What is the point of this whole process? When you could simply merge your feature branch (with its pretty and useful history you spent time working on) in main and be done with it?

I have a feeling this whole process you came up with stems from the fact that you don't seem to be aware that you can revert a merge commit.

asimpletune · on May 1, 2021

No I understand, I’m just saying in the real world that’s not how projects are maintained.

FunnyLookinHat · on April 30, 2021

> Squash merge has the major disadvantage of getting rid of valuable meaningful git history.

I think the main point of the article is that the Git history is often *not* valuable (because it's hard to enforce good commit messages). Your later points are very valid, however, and that's why I argue for smaller pull requests (and never merging a PR into main that leaves it in a broken state).

streblo · on April 30, 2021

> Developers should be required to submit _clean_ PRs, that is, PR's whose git history has been organized and refactored in such a way that it removed "clean up commits", "typo fix", etc.

The only way I've ever seen this accomplished is through Github squash and merge.

drtz · on April 30, 2021

> When you squash merge a feature branch that has thousands of lines of code, and 6 months later you have a bug introduced by this feature branch, it becomes extremely hard to find which line introduced the bug.

The article makes an argument that each commit should be "a single logical change." Squashing thousands of lines of code from a long-running feature branches into a single commit is obviously not what the author is suggesting.

edoceo · on April 30, 2021

maybe its not 100% one way or the other?

I love squash for somethings, but not others

today I brought in one big merge, 20+ commits , so don't squash it.

but I also have 20 "little fix" branches, with 1,2 commits each which I merge all together and squash in as one merge to main.

sevencolors · on April 30, 2021

This is exactly how i run my project.

Amazing that folks get into these "two sides" debate. The reality is always a grey area. And you should be flexible enough to see the benefits of the different variations. With some guidelines on how to make a decision.

PaulDavisThe1st · on April 30, 2021

hah, a big merge. in the next month or so, i will need to merge a branch with nearly 500 commits that change more that 15k lines of code, under development for more than a year.

but yeah, like your "one big merge", i do not intend to squash it (though there are actually some arguments in favor that only kick on with a merge of this size).

tharkun__ · on May 1, 2021

I would argue that this feature branch of yours is way too long lived :) But I get it, sometimes you can't change some things in a corporate environment.

I would like to know though, if those 500 commits are all individual PRs/tickets that originally might have been 3798 commits but each PR that got merged into the feature branch was squashed? Or are these 500 individual commits that are potentially interleaved, meaning commit #246 is for ticket ABC, commit #247 to #251 is ticket XYZ, #252 to #255 is ticket ABC again etc.?

I'm basically re-commenting here but I think if the 500 commits are commits that if you didn't have to have a long-lived feature branch for some reason you would've made those exact 500 commit to master using squash-merge, then it's fine to rebase the feature branch on master and do a fast-forward 'merge' that makes all of those 500 commits suddenly appear on master (find except for the fact that you had _one year_ feature branch :))

PaulDavisThe1st · on May 1, 2021

I'm not in a corporate environment. ardour.org is it.

The 500 commits cover a development process that radically alters a fundamental data representation within the software. It was difficult to design, difficult to implement, difficult to test. The whole process has simply taken a long time, and we've had other orthogonal development taking place in master.

tharkun__ · on May 1, 2021

You are definitely in a different world. As a dev I really like SaaS as incremental changes are much easier to do. I bet you've weighed the pros and cons of making those changes incrementally vs. as a big bang change.

Ozzie_osman · on April 30, 2021

> When you squash merge a feature branch that has thousands of lines of code, and 6 months later you have a bug introduced by this feature branch, it becomes extremely hard to find which line introduced the bug.

I try to avoid feature branches with "thousands of lines of code" on most of my teams, and have been pretty successful. Those types of feature branches create a lot of other problems. On the other hand, small, incremental pull requests that get merged back to master and have really short lifespans, along with things like feature flags to decouple delivering code from delivering functionality have worked really well.

In this world, squash merges are awesome, because any squash merge is basically a "commit" in the other world, and developers can feel free to commit however they want within the branch.

cortesoft · on April 30, 2021

> When you squash merge a feature branch that has thousands of lines of code, and 6 months later you have a bug introduced by this feature branch, it becomes extremely hard to find which line introduced the bug.

So, the only way that I can think that having the squash commit broken into individual commits would help to find the one broken line is because it would enable git bisect to find the failing commit.

However, that implies that each individual commit inside that feature branch worked on its own. If that is the case, a better suggestion would be to break up the giant feature branch into smaller sub features that can be merged as they are completed

nemetroid · on April 30, 2021

Even though most work can be split into a sequence of valid commits (i.e. sub-features), it's often not obvious how to best break up the larger feature before working on it.

cortesoft · on May 1, 2021

Sure, but I don’t see how that changes the situation. There are three possibilities; you make a series of commits that are each complete and can run on their own, you make a series of commits that can’t be run on their own, or you make one big commit at the end with the full feature.

If you are doing option 1, you could make one PR per commit, and each one could be squashed onto the main branch (although a single commit squash doesn’t do much). You can then use git bisect at a later time to find out which of those commits broke something.

If you make a series of smaller, but non-functional commits, you won’t be able to use git bisect no matter if you squash them or not. If you squash them, git bisect will only tell you that the whole series of commits introduced a bug or not. If you don’t squash, git bisect won’t work because the build and/or tests will fail on the non-functional commits. You don’t gain better discoverability by not-squashing.

The third option of one big commit makes squashing a moot point. Either way you only get one commit.

My point is that you don’t get better discovery power by not squashing.

nemetroid · on May 1, 2021

> you could make one PR per commit, and each one could be squashed onto the main branch (although a single commit squash doesn’t do much). You can then use git bisect at a later time to find out which of those commits broke something.

You can, and I've tried doing this (both writing code and as a reviewer), but I've found it to mostly have disadvantages compared to a single merge request with all the commits:

* The reviewer gets less context for the earlier commits. Usually, a sequence of commits like this starts out by doing refactoring/redesigning of existing code, with the later commits adding new functionality. If the earlier commits add a new abstraction, it's much easier to evaluate if it's a good design or not if you get to see the actual use of the new abstraction (in the later commits).

* It takes more time, both for the reviewer and the submitter. For the reviewer, each "iteration" of the review might be shorter, but there will be more iterations in total, since they're only looking at a subset of the changes at once. If the reviewer only looks at merge requests a few times per day (to strike a balance between being responsive and avoiding interruptions), this translates to longer total time.

Sometimes it does make more sense to do. If the total set of changes is very large, it's probably a good idea to split it up (though not necessarily to single commits). Likewise, if your early commits make fundamental changes to the design of the code and might need a complete do-over after review, it might make more sense to present that part separately (though see the first item above). But in my experience, it usually just leads to more work and more context switching for everyone involved.

> If you make a series of smaller, but non-functional commits, you won’t be able to use git bisect no matter if you squash them or not.

I agree that this is the worst of all options.

eplanit · on April 30, 2021

I see your points, but I disagree. FWIW, I also adopted the squash-merge only strategy with no regrets.

> Squash merge has the major disadvantage of getting rid of valuable meaningful git history.

It's not lost, at all -- it's in the branch it was squash-merged from. The point is that the topic/fix/experimental branch will likely have lots of history, including discarded ideas and other 'meanderings'. Every commit in that branch is meaningful in that it was a discrete step in that path. But, eventually what is _finally_ in that branch is what is relevant when merging to master. All that extra history in master would create confusion (and has for many in projects I've worked on).

If the discipline is that every commit must be a meaningful, viable point _in master_, then a merge of the history would make sense -- but I've never seen that possible (or even desired) in practice.

codesnik · on April 30, 2021

Mandatory squash merge would be very bad, right. But it sounds like success of this policy is highly dependent on the code size of average PR. I really liked to work at place where commits were usually squash-rebased, which got rid of most "typo"s, but long lived huge feature branches lived mostly usual life. And if possible, some logically atomic and finished groundwork parts of feature branches were extracted and squash-rebased into master ahead of time, slimming feature branch, sometimes to the point that feature branch could be squash-rebased too. Git blame was VERY pleasant to work with, and git-bisect would actually work if need would arise.

tharkun__ · on April 30, 2021

Using squash-merge doesn't mean that you can't use long-lived feature branches if that's something that you need for whatever reason.

It just means that you basically have a multi-stage squash merge strategy. We do this from time to time. We only allow rebased squash-merge to master. When we do have a long-lived feature branch for something then this feature branch basically becomes the master for the individual ticket branches and at the end, the feature branch is rebased onto master and merged as a fast forward. It can result in 50 commits appearing on master all at once but each of those commits is an individual small commit just as if they had been done directly with master and the squash-merge strategy. We rebase this feature branch on master very regularly too (once per sprint actually) to keep up to date with master. We also try not to do any long lived feature branches in the first place.

bennysomething · on April 30, 2021

Disagree, having a bunch of commits made up of undo, redo , dunno what I'm doing , oh no now I know, commits is pretty annoying. Squash solves that (for me, when dealing with other peoples merges). Obviously just my opinion etc

marton78 · on April 30, 2021

The problem is real and the parent commenter addresses it: learn interactive rebasing to craft a beautiful commit history.

lifeisstillgood · on April 30, 2021

I think you are using got commit log as a form of documentation.

I half-agree. I don't agree devs should spend (much) time cleaning up their logs as opposed to actually writing docs (inline) that help the next person

I do agree that there is rarely a good (even half good) history of decisions. This is never the Jira / tickets kept outside of the system.

But it is also no good abusing the commit log as a form of ... journal. (I know that sounds silly but )

I think an actual journal / blog kept by the dev lead will be much more useful

oftenwrong · on April 30, 2021

There's definitely value in putting documentation where it will be seen. For developers, writing in a code comment or commit message is a good bet. If there's an issue, they'll be in the code, and they'll be in the commit log, and they'll see what you've written.

At times I've been asked to document things in a company wiki. Typically nobody reads it and it never gets updated when others make changes to the code.

jtchang · on April 30, 2021

> Squash merge has the major disadvantage of getting rid of valuable meaningful git history.

There are reasons against it but the real world is messy. Developers tend to commit things, recommit, undo, redo, move things around. Is this valuable history? It can be but I would say 99% of the time it's more valuable to have a good commit message about what was intended rather than what actually happened.

The idea of squash merge is to make the history clean from the get-go as you said.

Izkata · on May 1, 2021

> Developers tend to commit things, recommit, undo, redo, move things around.

I see 2-3 distinct final commits in that list there, ones that should be separate and not squashed into one:

* commit, recommit, undo (squash these ones, and depending on how much "undo" does could eliminate this entirely)

* redo (what exactly is in this one depends on how much "undo" did in the prior, and how the "redo" was done)

* move things around (feature is seemingly already working and this is a distinct separate "improve the code" step)

> Is this valuable history?

Keeping that last one separate is absurdly valuable when trying to figure out why the code acts the way it does, when a bug is reported later. A lot of bugs I encounter were in those steps when someone makes a simple typo moving the code around, and having it in the history makes it obvious it wasn't an intended change.

eximius · on April 30, 2021

If only there was a way to have our cake and eat it too.

I'm assuming there is because I'm assuming those squashed commits are floating somewhere in the reflog and somehow connected to the squashed commits in a way that should still let you bisect if only we had a tool to make it not suck.

Alas, maybe someone with greater git-fu can inform us.

rubyist5eva · on April 30, 2021

In the article he says he keeps the original branch with all of the history around after they squash them with a reference to the original PR in the squashed merge commit message. So you can always just checkout the original branch and go digging in the full history.

waffletower · on April 30, 2021

Unwieldy to keep branches around even on moderate sized teams/projects. You don't need them to have a `squashed` view of history when needed. I am surprised that developers still cling to this outdated squashing regimen when pull request tools found on github, gitlab, bitbucket etc. already provide synthetic squash views derived from atomic commits by default.

rubyist5eva · on April 30, 2021

How is it unwieldy? They don't even show up locally when you `git branch` unless you've checked them out. Otherwise they just sit there doing nothing, there is literally no maintenance required on them once they've been squashed into mainline. Most central repositories end up having hundreds (or even thousands) of "finished" branches that nobody ever looks at anyway.

waffletower · on April 30, 2021

Depends on configuration if you wind up having that many. I use `git branch -v` often to figure out what branch a colleague is working on. I guess I could apply my same logic and use filtering tools here as well. But I don't see the benefit of squashing the merge commits when git CLI, github etc. can give you the same view without the gratuitous data loss.

beberlei · on May 1, 2021

They show Github in their screenshots, and on Github you can delete a branch and the original commit history in the UI is still kept and available. So you don't have the branches locally, but Github still has them around.

waffletower · on April 30, 2021

There are, as I answered separately:

`git log -p --first-parent`

chris_wot · on May 1, 2021

OpenOffice did what looks a bit like a "squash merge" way back in the day. It's a nightmare to work out where things got changed. I don't recommend it.

I quite agree. Fix your commit history via an interactive rebase before you push. Learn to love fixup.

garmaine · on April 30, 2021

> On the other hand, if you kept the history, and if this history was clean from the get go, it becomes easy to read the commits one-by-one and understand the issue.

Or git-bisect.

gouggoug · on April 30, 2021

Yes, git bisect.

But what does git bisect require to be useful?

A set of small atomic commits that make it easy to identify which line of code might be the origin of a very subtle bug you're looking for. Squash+merge will not provide you with such history, and make bisecting much less useful.

MarkSweep · on May 1, 2021

As long as CI is run on all commits, I’ll agree.

If you don’t run CI on every commit, then you have to balance nicer commits a higher likelihood of bisecting not working as easily.

hinkley · on April 30, 2021

Squash merge is for people who are so wedded to the idea that you should guess why the code is the way that it is instead of looking it up, that they want to make sure nobody else can do it either.

You know, assholes.

It’s fine if you don’t want to use something, or use it, as long as it doesn’t drag your whole team into your decision. Which squash merge does.

nickbauman · on April 30, 2021

Generally, time spent twiddling with the repo is time not spent delivering code. It's a distraction. Yes git has all these features that lets you do that and those feature matter when you're committing to the Linux repo which has thousands of eyes and your commit history has to help you communicate to a very wide audience. But the vast majority of us are not using git like this. I've used git bisect so rarely all this is overkill.

gouggoug · on April 30, 2021

- Generally time spent writing documentation is time not spent delivering code.

- Generally time spent commenting code is time not spent delivering code.

- Generally time spent diagramming on a white board is time not spent delivering code.

- Generally time spent writing specs is time not spent delivering code.

Yet, doing all of these are actually extremely important. How much importance you give each one of them is up to you that's for sure.

So, "Generally, time spent twiddling with the repo is time not spent delivering code", is true, it's nonetheless important, and, this statement disregards the fact that "twiddling" usually only takes a few minutes.

nickbauman · on May 13, 2021

I've never seen documentation efforts pay off in organizations that do not have documentation as a deliverable artifact to outside stakeholders.

I've never used commit comments to store important information about the systems I'm working on. OTOH I can see people working on Linux needing that a lot more than I do.

Feel free to twiddle if that floats your boat though. I will refrain from twiddling, thank you.

bonzini · on April 30, 2021

Time spent twiddling with the repo is time saved in the future debugging or writing documentation.

You can have large refactoring PRs for which splitting them further really makes little sense(*), but that have a huge risk of introducing regressions. Being able to bisect them is much easier with a properly maintained repository.

(*) And then you spend time twiddling with GitHub, which is the same as twiddling with the repo except with worse tools.

nemetroid · on April 30, 2021

Time spent "twiddling" with the repo is time spent documenting the business reasons for code changes. Depending on what type of code you're writing, this might be not-so-important or massively important for future understandability.

ziml77 · on May 1, 2021

Business reasons for code changes are kept in JIRA tickets and in merge requests. Comments might also explain business or technical reasons for certain chunks of code.

Commits are just logs of the units of work done to support completing those tasks. They often don't have any real logic for where they're broken up except that it happens to compile or that I want a checkpoint that can be stored remotely for safety.

nemetroid · on May 1, 2021

> Business reasons for code changes are kept in JIRA tickets and in merge requests.

It's fine for JIRA tickets or merge requests to have the meat of the details. So this commit message:

  PROJ-431: split function

is much better than just:

  split function

because although git blame does not give me the reasoning directly, at least I can read the ticket and hopefully understand it in minutes. In the latter case, there might be some later commit within the merge request that does refer to the ticket number, but Git does not have a quick way of finding later commits, so you might have to muck around with the log for some time to find the actual ticket reference.

But with a commit message like:

  PROJ-431: Separate base lookup and filtering
  
  For the Foobar customer, this data needs some special aggregation
  after lookup but before sending it to the main filtering function.

, it takes literally seconds between seeing a curious line of code and understanding why it was put there. Depending on how often a reader of the code has to do this, it may or may not be worth the effort. But in my experience working on decades-old projects with thousands or tens of thousands of commits, it makes a significant difference to productivity and the rate at which a new developer understands the code.

_dw7s · on April 30, 2021

The older I get the more I find these discussion as counterproductive as figuring out where to put the bike shed.

I worked in teams that did squash merge and in teams that didn't. And in teams where some did and some didn't, on the same repo. In the grand scheme of things it didn't matter, except for hardliners who had nothing better to talk about.

scubbo · on April 30, 2021

I agree with your point, but, tangentially, I do find it amusing that "where to put the bike shed" is actually _more_ impactful than what I thought "bike-shedding" referred to (arguing about what _colour_ to paint the bike shed for a nuclear reactor). At least the location of the bike shed actually has some (small) effect on people's commute!

marcodave · on May 1, 2021

that's some meta-bikeshedding over there (AKA bikeshedding the concept of bikeshedding), kudos to that!

jakeva · on April 30, 2021

> figuring out where to put the bike shed.

Anyway, the _real_ question is what color to paint the bike shed?

allenu · on April 30, 2021

Yep, I agree. In my ideal world, everybody (including me) would have pristine commits that are broken up logically. There wouldn't be small commits created as part of code review to fix things up based on feedback. You'd just have a single "Add feature X" or "Fix bug Y".

However, it's an imperfect world and there are trade-offs. It's hard to train people to do things the right way, and then policing when they do it the wrong way or putting roadblocks in place to make them do it the right way takes energy, and often the benefit is minor. It really depends on how frequently you pore through your history to find an offending bug and a whole bunch of other team policy that isn't directly related to git commits.

I used to spend a lot of energy rebasing so that I could have a beautiful history, but honestly it hasn't benefited me that much. I still rebase, but I'm not so strict about it on my commits, nor others. I'd rather spend my focus and energy elsewhere.

Just do what the team is comfortable with and be flexible.

u801e · on May 1, 2021

The question really is whether the team values version control history with good commit messages. A simple test would be to make all commits with the --allow-empty and --allow-empty-message flags which allow you to make commits that don't change any files in the work tree and don't include a commit message.

If the team doesn't see an issue with it, then using those flags should be mandated since what the team is really interested is just snapshots of the codebase. Any associated message (the commit message) is effectively useless and ignored, so why bother even including it?

marcodave · on May 1, 2021

not only getting older, also getting closer to the project managers point of view gives a different perspective on what does really matter for the project. Spoiler alert, commit history most likely does not.

deathanatos · on April 30, 2021

Squash & merge is objectively worse than rebase && merge --no-ff.¹ (Roughly what the article calls "no fast-forward".) "No fast-forward" meets all of the criteria the article's author proposes:

> 1. Combines all the code changes related to a single logical change

Yes: the merge commit is that.

> 2. Provides an explanatory commit message that helps people understand the intent of the change

This is no more or less true that squash & merge. (Although, I don't off the top of my head remember how good the automatic message in Github is.) But that's more a problem of an automatic message than it is the merge strategy, & squash and merge also has this. (I've seen numerous squash commits with "Fix CI, Fix CI, Code formatting, Fix Lint warning" in them…

Good commit messages boils down to the discipline of the coder. (And a reviewer being able to say, "Can you write a better commit message?".)

> 3. If you pick this commit independently from the history, it makes sense on its own

The merge commit.

What you lose with squash & merge is the history. The author does sort of address this:

> In case you are wondering if we are losing the individual changes, the answer is no. Each squash merge references back to a PR where the whole changes are tracked:

And while this is technically true, it's a reference only in the textual message (which Github will nicely turn into a link, but you must be in Github for that). "No fast-forward" will maintain those references in the git commit parent information, which means that tooling like git bisect should be able to see into it. But with a squashed commit, the closest you get is "some commit in this PR", essentially. Same with reverts: if just one commit on a feature branch is bad, you can simply revert that one commit. (Or, if most of the branch is bad, you can revert the merge commit & cherry-pick the good bits.)

If you don't want to see all the feature branch commits in the history, you can just follow first parents.

¹I prefer a quick rebase prior to merge but after code-review, as it is a good balance between the resulting history being readable, and not rewriting history while your reviewer is looking at it. But the argument here should hold regardless; the definition the article uses is sufficient, too.

f154hfds · on April 30, 2021

My org has started to strongly recommend squashing before merging (in my opinion one step less extreme than the forced squash-merge mentioned in the article). I tend to consider this a decent principle in general but rules are made to be broken.

My main concern is if anyone ever commits based off of a pre-squashed branch. They won't be able to simply merge upstream after their parent has been squashed, they will now have to cherry-pick or incur strange redundant merge conflicts as they no longer share history.

For a small team whose features tend to be short-lived before going upstream this won't likely be a problem but believe me, if you ever need long-lived feature branches on a larger team, squashing them can cause more trouble than your nice history will gain.

u801e · on May 1, 2021

With squash merge, Github will either:

1. Create a single commit and a merge commit that's empty because the single commit's parent is the current HEAD commit of the master branch

2. Create a single commit and a merge commit that has a diff because the single commit's parent isn't the HEAD commit of the master branch

Scenario 1 essentially doubles the number of commits in the history, and the information in the merge commit message could easily be included in the squash commit message by amending it.

Scenario 2 does the same thing, but could easily be changed to scenario 1 by rebasing on the HEAD commit of master before doing the squash merge.

The fundamental problem with squash merging is that, depending on the scope of the PR change, you basically end up with a large commit that contains many changes that really ought to be separate individual commits. Those large commits are difficult to revert due to the sheer number of conflicts one will encounter when trying to revert them.

On the other hand, well crafted individual commits can easily be reverted and are less likely to result in a conflict. Even if there are conflicts, they are limited in scope and can easily be dealt with since the commit itself doesn't really touch as many lines of code.

lamontcg · on May 1, 2021

> You can use short-living branches to avoid repetitive merge of master

> No, this doesn't work. It does if you have very few developers, each working on individual branches. But when multiple developers are working on multiple-feature branches together, that doesn't scale. We encourage backporting master often into your branch to limit the risk of conflicts, and stay on top of the latest changes. For example, we continuously update dependencies. We also merge and ship on average 10 times a day.

Those are development branches that should be treated like master/main and developers should be writing short feature branches which are merged into those branches.

And in the simplest case when you start work on N+1 then that becomes your main branch, and you fork off the stable N branch. Both of those are treated the same and devs should do PRs into them. Those PRs should ideally be simple things that are easy to review which should be single commits. This enables squash+rebase on those daily private branches.

You can also fork off a nuclear reactor branch which is long-lived, occasionally merge back from the main branch and then merge into main later. I'd discourage that since you have the issues of merging and porting, potentially both ways. But again you shouldn't have devs directly committing to that branch and then pushing, they should be opening PRs of their work into it, when its merged, then other devs can pull that branch and rebase their local work on top of it.

I've helped out individual devs with commits on top of their feature branches from time-to-time, but I consider that they've got the ultimate "write lock" on those branches. Typically as a result of actually chatting with them in zoom sometimes it might be faster to just push a commit with the work. But once that is done, its their private branch again, and I expect them to rebase and for me to have to deal with that if I look at it again.

EDIT: Also the long-lived feature branches are going to be impossible to adequately review when it is time to merge them back into main. That code review needs to be happening all along the way if that is your process. You can't adequately review 4 months of work by several devs once at the end of it all. They need to be doing that the whole time. That calls for a branch they're doing their own PRs and code review into.

Qerub · on April 30, 2021

It would be nice if GitHub allowed comments on commit messages and not only the changes so that they could be discussed for the benefit of learning and improvement.

Squash merge commit messages are currently not reviewable at all since they are not entered until just before the merge.

I'd prefer for nothing at all to be able to enter the main branch without review, if not for anything else to protect myself against my own mistakes.

Guess I should give Gerrit a shot.

hmsimha · on April 30, 2021

This is great! The takeaway for me is that with squash merge you get one commit with all the changes which (optimally) has the full context in the commit message.

My typical workflow is basically the same, but with putting all that context in the merge commit. This allows you to find it with a bit of work (blame to find the commit line, then figure out where that was merged in, then find the merge commit). Squash merge puts all that context in one commit, and keeps a linear history. I had assumed squash merge squashes the commits, then creates a merge commit still, which would mean you'd probably do something like combining messages for each commit in the squash commit message, then capturing the overview in merge commit.

The article says you can still find the individual commits via PR, which is a minor disadvantage as it means you can only do exploration of these via github. If you've deleted the topic branch on github are they still accessible? If it's been garbage collected by git (or never existed locally if you're looking at someone else's changes), is there a way to check them out?

globular-toast · on April 30, 2021

I don't really care what people do as long as they understand one thing: the entire point of keeping history is to be able to track regressions. Seriously, give me one other reason to not squash master down into a single commit every time I commit anything. If you understand the purpose of history then you'll understand it's important to keep it in order and then you can make a decision about how you keep it in order. Personally, I think squashing branches down leads to unnecessarily large commits. Smaller commits are better.

> Commits are essentially immutable.

Commits are immutable in the strictest possible sense. As is any other object in git. Not only that, you can't delete objects either. Every git repo in the universe together represents one giant, immutable, append-only object store.

u801e · on May 1, 2021

> As is any other object in git. Not only that, you can't delete objects either. Every git repo in the universe together represents one giant, immutable, append-only object store.

To be pedantic, it is possible to delete objects by manually or automatically running the git gc command. Objects that aren't referenced by other objects will eventually be deleted from the git object store.

lmilcin · on May 1, 2021

My biggest issue with squashing everything is that every non-trivial development necessarily composes of three types of changes:

- functional changes (changes functionality as visible from outside),

- refactorings (changes how application works internally)

- reformatting (does not change the compiled paths but improves readability).

Now, there is very good case for keeping these types of changes separate.

For example, when I want to change something I may want to first refactor it (to bring the code to a state where the functional change can be done easier and is more clearly correct), then I will make separate commit/commits to modify the code functionally, then I will possibly follow with more changes to "clean up" -- refactor the code using my newly acquired knowledge.

I always mark refactorings / reformat by starting the commit message with the word 'refactoring' or 'reformatting' so that any reviewer can easily see that these commit are not allowed any functional changes (and if I made one it means I made a mistake).

What this means is that the functional changes are as light and to the point as possible, not burdened with unrelated changes. It makes it much easier to review and ensure they are correct when I "tell" the reviewer which parts of the change are intended to modify functionality and which are just housekeeping.

That information obviously vanishes if you squash everything together.

**

Another big problem with squashing commits is that they become very large.

I find it makes much more sense to compose a large change from smaller, simple, logical, understandable changes that each produces (hopefully) working application.

If you find a problem with the commit (for example as a result of bisecting to find where something was introduced) it is much easier to understand what the change intended and compare this with the actual code modification to figure out where I failed.

Obviously, when you squash it the information is gone.

Pxtl · on April 30, 2021

Squash was the thing that convinced me that the emperor has no clothes. Realizing that I was going to either have to train every junior, every four-month community-college student brought in on co-op to modify their history in an awful UI with tons of gotchas, or I would have to accept the downsides of squash?

It's so stupid.

Git desperately needs a layer above the commit that groups related commits together into a semantically commit-like object that you can show in history and jump to its HEAD and cherry-pick. Because the squash is a dumb hack, and meticulously editing your history is not productive work.

I want squash. But I want squash without all the boneheaded implementation-detail downsides of squash. I want squash where I can put up a PR and keep working and then not have to deal with the cherry-pick pain if I want to build off that work after it merges. I want squash where I can leave the branch up after the merge and still see that it's behind the main branch.

But git's simple "everything is a commit" model makes that impossible.

eMGm4D0zgUAVXc7 · on April 30, 2021

> Git desperately needs a layer above the commit that groups related commits together into a semantically commit-like object that you can show in history and jump to its HEAD and cherry-pick.

It's called a "branch" :) The commit-like object to represent it is the merge commit.

If you need to group related things in a branch in a more fine-grained fashion then do sub-branches and merge them into the branch with "--no-ff" so you get a merge commit for each to describe them.

u801e · on May 1, 2021

> Git desperately needs a layer above the commit that groups related commits together into a semantically commit-like object that you can show in history

That's what a merge commit does. The first parent is the base commit and the second parent is the head of the branch that was merged into the base branch.

So, if you run git log first_parent..second_parent, you would see a list of related commits (related by being in the same branch).

operator-name · on April 30, 2021

Can't you achieve something similar with `--no-ff` and tags?

Pxtl · on April 30, 2021

A tag defines an endpoint, not a group of related commits.

Yes, I could roll my own adhoc layer on top of git but it wouldn't have any tooling support.

"Well, you see you can infer the start of the rangE of commits for this feature by going back to the previoNOOOOOPE

operator-name · on May 2, 2021

Leaving branches around would also work depending on the adhoc implimentation. I thought tags would be nicer as they'd be less than leaving thousands of branches.

> Yes, I could roll my own adhoc layer on top of git but it wouldn't have any tooling support.

Most git commands are ad hoc shell scripts built on git's internal data types. I can't see why adding your own would be that bad, a lot of external git tooling does this.

_haoa · on April 30, 2021

My old team’s git workflow required us to rebase our feature branches to develop before merging, and our branch could only contain a single commit. The commit needed a special formatting (short title, description, and story number) so it would get picked up by Jira scripts.

Although the history was super clean, the extra upkeep was a small annoyance. I felt like tearing my hair out when there were 10 other merge requests pending and never knew which would be the next to merge with develop. An auto-rebase feature (assuming no merge conflicts) would have saved me countless pointless minutes.

wbronchart · on April 30, 2021

I love clean linear history, but I don't like squash&merge, and I don't like the other options that the github interface gives you either.

Plug: I wrote a script recently that merges github pull requests but preserves linear git history (basically, rebase + merge)

https://pypi.org/project/git-pr-linear-merge/

epage · on April 30, 2021

This is baked into Azure DevOps, I'm surprised they haven't pulled it over into GitHub.

wbronchart · on May 1, 2021

Yeah, you'll find endless threads on github issues of people complaining Github doesn't have this yet. It's a real pain

nojvek · on May 1, 2021

I love squash merge. I’ve seen it being enforced in many places I’ve worked at.

You’d think losing commit history due to squash is a bad thing, but it’s actually a good thing in a large team. When you have 100s of merged into master you want to see one atomic commit per PR. If something is wrong, then you can revert that individual commit.

It’s also much better signal:noise ratio. Most individual commits in PRs are like “fix”, “ugh! lint” and things like that. The PR title has much better names. GitHub squash merge adds title and link to pull request with the number. So if you want to see finer version of squash commit, you can always do that.

However squash merges tell you one big then in the main/master branch “who did what when”. Git bisects to find when a regression was introduced work work well. The whole idea is a merged PR is a piece of work ready to be deployable. Individual PR commits are not.

bassdropvroom · on May 1, 2021

I've never understood the need for squash merging. One of the biggest advantage this article mentions is that the history will look neater. Well what if I told you that you can have both? Git log has the `--merges` flag (I could be mistaken on the flag name), which will remove everything that's not a merge (it does require using `--no-ff` which github et al use when merging PRs). Now you have a clean commit history that doesn't include the individual commits as if it was squashed. But your real history still contains the individual commits thus doing something like bisect will give you much small commits to look at.

Granted, I'm not aware of github or the like having a view that only shows merges but git log command definitely does.

da39a3ee · on May 1, 2021

Squashing PRs is mostly a good idea. However. When you fix a bug, you should nearly always:

1. Commit a failing test that reproduces the bug

2. Commit a fix

And I tend to feel that this sort of informative history should be preserved in the main branch, seeing as we have technology that is designed to do that.

welearnednothng · on May 1, 2021

This is one of the things I miss about Perforce - every merge was effectively a squash merge, but the branch was still there, just not visible by default (if I remember correctly). It was the best of both worlds - you keep a complete audit trail, but it was out of the way and didn’t interfere with your day to day view of the commit log. Moreover, there was never any discussion on merge strategies as there was just the one way to do it.

On top of that, they had some amazing tools for aiding in audits when you needed them.

I’m not suggesting we all switch to Perforce! But I do miss that aspect of it.

cratermoon · on April 30, 2021

It seems DNSimple has determined that the git commit messages are the System of Record not only for what changed, but why. But there are other ways. You could name the branch (or put in the comment a number) for a ticket in the bug tracker/kanban/whatever you use to track work. You could add a link in the commit to internal discussion or documentation about the work.

While a bit unorthodox, I'm not sure I can be against making the git repo history the canonical source for work history.

jakeva · on April 30, 2021

In my experience squash messages combine the messages of the commits they consolidate. And the actual branch merged still exists, I don't buy hardlined stances against squashed merges.

gouggoug · on May 1, 2021

That would be fine if the only important information in a commit was the message. But that's not the case.

A commit gives you a message, but also a specific line of code and its surrounding context. The message is just one more bit of information (an important one!).

By squashing, maybe you keep the messages, but without their associated changeset (which you lost when you squashed), these messages are of poor to null value.

cratermoon · on May 1, 2021

> In my experience squash messages combine the messages of the commits they consolidate.

I believe they do, by default, but the developer gets to modify. In other words, if I squash 3 commits, the git cli will set the resulting message to be a combination of all 3, but opens an editor to let me change it.

I'm not for or against any git merge/squash/rebase flow on technical grounds, but I do want every message in the commit log to stand on its own. In other words, I agree 100% with the intent of what DNSimple does, and if this is how they chose to do it, I have no disagreement. What matters to me is outcome: the commit history as a useful 'document' for how we got to where we are now.

jakeva · on May 1, 2021

I guess I haven't worked on a project important enough where the history is more important than the current state. Usually it's more like it might be useful to gain perspective by looking at the history, only to find it was a conflicted merge or the comment contained no useful info anyway ("Clean up", or "Fix this feature").

Even if it tried really hard to document the purpose of the change, I've rarely gotten more from a commit message than the content of the code changes in the commit.

cratermoon · on May 1, 2021

> I've rarely gotten more from a commit message than the content of the code changes in the commit.

That's generally true, but the article is specifically about how a rule/convention/expectation in DNSimple workflow ensures, or purports to ensure, that commit messages are informative. My experience is that it doesn't really matter if the explanation is in the commit message, a bug tracker ticket, or the team wiki, as long as I can go from commit -> understanding, it's good. I also say that if a programmer can't craft a useful commit message, they probably don't have good documentation for it elsewhere.

erik_seaberg · on May 1, 2021

Bug tracker searches are generally pretty poor, even if your team never migrated from one to another. During a 3 AM outage, I need to grep git log for affected files to find likely root causes from 2018 or whatever.

I want reviewers to insist on a commit message that says what’s going on here and why. Not a full design doc, but enough to narrow down which commits are related to something. Please don’t make me open a hundred bug tracker pages that may or may not still exist.

chmaynard · on April 30, 2021

Here's an alternative that appears to accomplish the same thing:

  # checkout feature branch
  git switch feature

  # reset HEAD while preserving changes to working tree
  # commits on feature branch will become orphans
  git reset --soft main

  # commit all changes on feature branch
  git add -A; git commit -m <feature description>

  # checkout the main branch and merge
  git switch main
  git merge feature

globular-toast · on April 30, 2021

Yes. There are many ways to achieve the same thing with git.

FalconSensei · on April 30, 2021

I love squash-merge, and never had to see which specific commit inside a PR/MR changed a line. In my current project we use `JIRA-### Title` and the MR title (Gitlab). All information is in the ticket. Also, IF I needed to see the detailed commit history for that merge, Gitlab still has it. We try to have our ticket to be small, so any merge shouldn't have a long commit history anyway.

beirut_bootleg · on May 1, 2021

It's all fun and games until they try to use lerna, when they discover it tries to determine packages that have changed since the last tag but says all packages are brand new, because all of their tags are now dangling somewhere after the commit they were attached to got squashed.

s17n · on May 1, 2021

I didn't realize that anybody didn't use squash, tbh.