Hacker News new | past | comments | ask | show | jobs | submit login

Do people out there actually squash commits? Granted, I didn't change many work places in my career, but at no place where I worked people squashed commits. What's even the point of it? It's not like people routinely read the commit history, and when they do, they really would like a complete story, not 20 gargantuan commits that contain 3 years of development.



I find the individual commits on feature branches to be more noise than signal after they are merged. They can be useful during review, sometimes, but mostly I want a cleaner history.

> they really would like a complete story, not 20 gargantuan commits that contain 3 years of development

That sounds like maybe we split up work differently. 3 years of development for me or my team would likely have hundreds+ of merge/squash commits, not 20 large ones.


What do you use your "cleaner history" for? I seldom go back and look at historical commits unless I'm debugging, in which case I'd prefer to know what actually happened at the time with as much information preserved as possible.


A clean history with proper commit names can work as a change log for you app. We actually did that and it worked quite well. In Azure DevOps the release pipelines have access to the list of commits since the previous release so one only has to simply take the titles of those commits to build it.


You can achieve that with PR titles instead of commit titles.


But then... PR titles don't show up in the git log.


I’d prefer that too, but preserving all commits often leads to things like: ‘Changed file 1’, ‘Changed file 2 and 3’ etc.

You’ll have a ton of noisy commits that together make up one full feature. In this case having all that noise squashed into one commit with a proper description is much nicer.


Fore removing code that's buggy in production. It's way easier to find and remove 1 commit.


Isn’t that what rolling back releases for? i.e. revert the container (or whatever your unit of release is) to the previous version? If you’re removing individual commits and re-releasing (rolling forward), you’re still at risk of being down, because you’re not going back to a known good release.


And what do you think about the linked article that shows how it's easier to find exactly which code is buggy if you have it in small commits instead of one big squashed one?


Good, do what he did too... but in a feature branch.


But surely you would like to squash those merge commits as well, at some points? Hundreds of merge/squash commits pollute the history almost as much as thousands of normal commits, so after a large feature set is done, one should just squash all of it into a single commit.

That way, you can have "nice" development history where tags for the old versions are redundant since they match commits one for one:

    $ git branch
    main

    $ git log --oneline
    abcd000 (HEAD -> main) version 5.2
    abcd111 version 5.1
    abcd222 version 5.0
    ef01234 version 4.7
    9876543 version 3.4
    fedb123 version 2.12
    dedc456 version 1.128


The git log isn't a release history, it's a development history. Squashing commits throws away context that could be useful in any number of ways.

There's nothing wrong with squashing a bunch of commits that really should have been a single commit from the start:

- Update X to do Y

- Fix typo in X

- Add "foo" option to X for when Y is a bar

But most of the time "features" consist of many changes: we add one function we're going to need, then another, then yet another... Then we change an API to expose the new functions, then extend the UI to make room for it and finally make put it into the application and pull all the strings together.

Squashing all these into one commit is just a bad idea. You can always do git rebase -i before merging to do minor fixups or even reorder your commits (like when you notice a typo after several other commits), but completely removing all granularity... just no.


When you have to find a subtle one line breakage that happened years ago, you’ll be happy that you had a single commit with 30 lines of code instead of a squashed commit with 3000. It makes it much easier to isolate the case of a problem and fix it in many cases.

I’m asked “how long has this been broken and what caused it?” and answer it via git blame or bisect probably every few weeks. This is life on legacy projects.


I think you just highlighted one source of the disagreement. Nobody on any team I work with would accept a PR with 3000 lines of change. Each team had a policy, whether formal or informal, to break large work items across PRs so they were more easily reviewable. I would say that 300 lines changed is getting towards the bigger end of what we would accept. If your choice is between 30 or 3000 lines changed per commit, then for sure I'd pick 30, but above either I think a 300 line limit is more sensible.


Yeah, that’s nice in theory, but in practice the situation is often less ideal. Some features, partially implemented, would break existing functionality if not completed, and merging those upstream prior to total completion is therefore impossible. And on the other side of things, some environments and features require large, systemic changes. This is, surely, an organizational failing, but one that we must adapt to.


Then that's the other half of my point. If you can limit PRs to 300, then merging PRs into a single commit is better. If you can't, then maybe squashing isn't ideal.


Convert all the JS files to Typescript, suddenly you get blame ed for everything.


That's not a squash or not problem, that's a your tickets are too large problem. If you normally end up with 3000 line feature commits, you're trying to do too much with individual feature changes.


Are we pretending that we, as devs, have much say in this? If our management says they want a features that’s going to take 3000 lines of code, you don’t have a choice. It’s nice if you’re somewhere where you can roll out feature mvps, but in some environments you don’t get that luxury.

And some features are just monsters, either through the nature of the feature or architectural choices that were made before it was conceived.


Then you split the feature up? It’s generally possible to do this.


I mean, I’m the one arguing for smaller commits, so yes, if you make it 10 commits via 10 prs instead, that’s fine with me. But if all of that has to go out at the same time, that’s not any different than one pr with 10 commits (and probably worse because it’s harder to see all the changes together).


I actually occasionally make those kinds of pull requests and would recommend that over the alternative.


I think based on this discussion I can identify one situation where squashing would be good: when you're doing repeated trial-and-error commits on a single file. That way, squashing won't turn it into 3000 lines of code, it just hides the unsuccessful attempts.

But other than that? Small commits please.


If you look at the PRs list, you will have a clean and tidy history.


Yep, but many tools (most notoriously git bisect, famous from the article) doesn't understand "--first-parent" which is ridiculous. The only way to have a clean history that all tools accept is clean, is basically to outlaw merges.


> That sounds like maybe we split up work differently. 3 years of development for me or my team would likely have hundreds+ of merge/squash commits, not 20 large ones.

With the people I've worked with, I'd say most don't commit at all until they think the code is "ready" and they commit all at once. In the teams I've worked with, squashing vs not squashing isn't the question. I just want them to commit/push as soon as they've hit a stopping point or at least once a day. Maybe the people you've worked with are good with git but I am not that good with git.

I'm still stuck on 6: Resolve a merge conflict on git exercises because I made one too many commits and now the exercise says I have too many commits.

https://gitexercises.fracz.com/

Previously on HN: https://news.ycombinator.com/item?id=24671638


You need to squash your commits, then: https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History


The argument for squashing is that minor updates like spelling, renaming, or test fixes during initial development can really clutter the history if they each have a commit. Many would rather see the actual change "Update X to use Y instead of Z" in the history, and minor details like "Fix mock in XTestCase" or "Perform renames from code review" within that single commit.

I'm sort of agnostic on this issue, but I do feel the article's author kind of overstated it here. Say 5 to 10 commits of this size were squashed together. Git bisect would've taken him 90+% as far and he'd have to read code or manually trial and error changes just slightly more. The binary searchable problem space would be slightly smaller, and the linear manual effort space slightly bigger. Less good, but really not that big of a deal.


For me, it's the other way round: minor updates like spelling, renaming, or test fixes during initial development can really clutter the major updates.

If i am making a commit that makes a complex but important change to some significant application logic, i want that commit to contain that change and only that change, so that when i have to re-read it a year later, it's completely obvious what i did and why. Bundling a load of refactoring and cleanup in there is a significant speedbump for my understanding.

Years ago, a sage pointed out the argument for squashing is really an argument for better tools. Imagine if you could flag commits as being of two types - major/minor, significant/insignificant, feature/refactoring, foreground/background, melody/rhythm, etc. Then imagine if the tools would by default hide, roll up, or otherwise de-emphasise the commits of the latter kind. This whole apparent dichotomy would go away in a flash.

This idea is floating around in the Wiki world. I believe it was Ward's Wiki that introduced a 'minor edit' checkbox in the editor; if a change was marked as a minor edit, it wouldn't be show on the recent changes feed.

You can imagine other ways to get somewhere similar. For example, you could have a special kind of commit that just groups a previous run of commits, and the tools could show that and hide the members of the group by default. There are probably many other ways to do this.


There is a special kind of commit that "just groups a previous run of commits", it is called a merge commit. Tools like --first-parent can restrict git log/git blame to just "top level" merge commits. The only real change is that isn't the default in most tools, but they don't have to show the full graph, they could better focus on a specific depth by default.


Interesting idea, however this could also be easily abused for nefarious use.


I squash commits. Well, I rebase so that changes are logical rather than historical. This is because when people read the commit history, which actually happens regularly, they would actually like a complete story, not 15 commits of "fix audit", "fix review" and "fix typo" - let alone refactors where previous work in the PR is thrown out, which you can by definition never care about.

Commits so small that they're nonfunctional also break bisect.


Nobody is suggesting you should make commits so small they're non-functional.


Are you simply suggesting the removal of those non-functional commits to be called something else than "squashing"?


I'm suggesting you don't make them in the first place.

If you do make them by mistake - and everyone does sometimes, certainly including me - then sure, commit --amend or squash them before pushing them up.

The squashing the article is talking about is collapsing functional, distinct commits into one when merging to the trunk. It would be useful if we had distinct terms for the two kinds of squashing; the git command commit --fixup and its associated interactive rebase operation suggest the name "fixups".


The difference between fixups and squashing is only in how the commit message is generated.

Both are (interactive) rebase operations. Both are established terms in git, and it is probably not very useful to change that nomenclature now.


They're the same technical operation, but they're used in different ways, and it's the difference in use that matters. It's useful to be able to distinguish them. It's never too late to improve our terminology! How long had git diff --cached been around when git diff --staged was introduced?


When I read the commit history I want to see what was committed.

"Hmm, I had a half working X509 chain resolver there that turned out to be unnecessary at the time but would save me a day's work now..."


Keep it on a local branch?


I'd moved workstation by then. If distributed source control is good for anything, surely it's reducing reliance on local state!

I don't think there's an easy answer here. I think there are good things to be said for a readable and bisect-able version history, but also that if you're not preserving your real commit history in a way that's backed up remotely then something's probably wrong.

I'm beginning to think this area, this schism into two schools of thought, is really a signpost that there's something lacking in git's branching. Or something everyone is missing, including me :)


Bisect doesn't have a problem with intermediate broken states, actually. It's why git bisect skip exists. It may not be able to pinpoint an exact commit though.


Bisect does turn into my colleagues feature branch with 20 broken commits and unrelated content though, when I just want to bisect the main branch. That's infuriating and imho makes bisect almost useless. That there is no git bisect --first-parent is one of the greates mysteries of git.


I often do intermediary commits that don't compile or break something significant, only to make sure that I don't lose the code. When that happens I always squash my commit afterwards once the code is in a usable state.

The alternative is making bisecting harder which is not something that I want. I want every commit to compile and be testable individually. That definitely doesn't mean that I think it's a good idea to have "gargantuan" commits.

An ideal commit should contain a single, atomic change to the codebase. Not more, but also not less.


I'm a bit confused on your comments here. You state "I often do intermediary commits that don't compile ... to make sure that I don't lose the code" And follow that with "I want every commit to compile and be testable individually."


I'm the same as OP. I commit constantly when hitting a natural stop point or when taking a break and push up to my branch/pr. I hate having code locally that hasn't been pushed to the origin.

But doing this means I have many commits that don't mean anything or are unfinished and in an uncompilable state.

So I git rebase and massage the history to make more sense.


I mean that I do many small "crap" commits while developing, once I'm ready to merge into the main branch I clean the history to get proper, atomic commits.


I squash commits on large PRs on a case-by-case basis, especially when they have a lot of very small and silly commits (fix this, fix that, wip, ...) but implement a very specific feature which required a lot of experiments that didn't end up in the final PR. And yes, I use the commit history very frequently, so it's important for me that this doesn't contain too much noise.

If I need to bisect a bug that was introduced by the PR I still can do this in the original branch.


Ideally, that would give you the best of both worlds. But some places delete the old branches after they're merged. It would be nice if there were an easy way to hide or rename old branches so that the in-use ones stand out.


Everyone who works on Linux must rebase and squash commits so that every commit builds and passes tests, and does a single thing. More about how Linux uses git:

https://www.mail-archive.com/dri-devel@lists.sourceforge.net...

https://www.linux.com/news/why-linuxs-biggest-ever-kernel-re...


I'm probably 75/25 on this. 75% of the time I'm gonna squash, because my entire commit messages is along the lines of "got to this point" or "fix frotzolate when x=7". 25% of the time my first round of squashing yields good commit messages that are isolated and complete, so it's better to leave those as separate commits when merging into the mainline.

I also aggressively reorder my commits. When it gets to yak shaving you basically develop in a stack, so you end up with a half-functional commit to system A at the top, then a complete commit to system B, then the finishing commit to system A, so it's better to just rearrange it so that you have one system B commit followed by one system A commit.

But I do routinely read the commit history (using git blame) and no I don't want to see the "complete story" of my past self having "got to this point"--I want the documented MR commentary.


Squashing is kind a blunt tool that is useful sometimes, but can be perfectly replaced either by crafting your commits with more care or doing interactive rebasing.

I once had to enforce to an developer that wanted to make 20 commits per PR that were titled like "wip" "wip" "wip" "fix bug" "fix mistake" "format code" in a PR that only changed like 10 lines of code in the end.

In this case, the main problem for me was that git blame became useless because this developer was touching way more lines than necessary and then undoing it using the code formatter.


I think what you really want is neither of squashing nor retaining: what you want is to take the messy history you had and then reconstruct a history that makes sense and commit that; thankfully, git makes this easy. I hate it when people leave messy thoughts in the commit history as I do use the history, and it is even worse if they just leave mega-commits. I want to see easy to review step by step organized thoughts designed to help the reader appreciate the series of steps needed to bring you from where you were to where you went.


I would say the best thing is to cultivate the habit of thinking and working cleanly, so you create a history of small, logical, incremental commits.

But the second best thing is to think and work messily, and use rebasing to fake a history of small, logical, incremental commits!


It really depends on where you work and how your company's repo is organized. For instance, where I work 20 squash commits would represent, at most, 10 minutes worth of commits to the monorepo. Not squashing on merge would quickly turn thousands of daily commits into tens of thousands. It's already impractical to find suspect commits via git, and we typically use our code review tool to find the change that broke something (and that change includes the developer's full branch history). Adding more commits would needlessly slow down the tooling that displays 'git blame' results (and likely other commands).

I suspect there is a pattern in the comments here. People who work on small teams with granular repos that, individually, don't see a lot of daily activity think squashing is bad and erases valuable history. People who work with large repos that see high commit velocity (like a monorepo) think squashing (at least merge squashing) is beneficial and don't see the loss of information as problematic because it's hard to access it in the first place. Maybe I'm just projecting my own opinions on this; I'd like to hear perspectives that conflict with my assumptions.


You are definitely correct. People who argue against squashing have not worked on 10+ years old actively developed repositories, or in big enough teams.

A commit is a change. A change has a ticket. A ticket is a small piece of work that does not result in 3k changed lines.

This ensures that the change rationale is fully documented and easily identifiable.

Nobody needs those "fixed typo" commits. Nor the "implemented function A" commits. What IS the change that you're doing? What functionality? That's the most comfortable commit granularity to debug imho.


Agreed, I regularly add comments later as well. Understanding often comes after the code works.

I could obsessively fiddle with every commit like an artesenal snowflake, or I could click the squash checkbox on the request.


If you always work in feature branches and commit often (which you should IMO) it makes perfect sense to squash on merge. It gives you a nice history that only contains relevant commits instead of having a bunch of “added tests”/“fixed XYZ”/“remove debug log”/etc commits.

With some discipline this makes the commit history actually worth reading and makes git blame a useful tool.


You don't have to squash into ONE commit just because you squash.

If I make 10 commits I can squash it into two logical commits (e.g. refactor, add feature) then I merge the branch with those two.

If the branch is small and has 3 commits that are one logical change, then I might as well squash to main instead of merging.

Commit history is readable regardless (I'd never use anything but --first-parent ever).


  # Implement feature
  # Whoops fixed issue in feature i just implemented
  # Add in whitespace
  # Remove whitespace
  # Forgot place to add in whitespace
  # Fix variable name for feature
Vs

  # Implemented feature "x"
Which ones easier to rollback and read.


Routinely. Common example that I can't see why anyone would take issue with:

  * Commit 1: Fix a bug
  * Commit 2: Fix linting issues with the fix discovered through CI
  * Commit 3: Remove dead/commented code introduced thrugh commit 1
  * Commit 4: Update documents as required by change
I'd always squash those into a single commit before merging into upstream.


I can see a good case for separating #2 from everything else. Or depending on the change, maybe squashing #2 and #3 together separate from 1 and 4.

It's a good idea to isolate the bugfix so that reading the diff is clear. Stylistic changes such as #2 and #3 can be separated and called out as "should produce no behavior change" sort of updates.


I do, all the time. I highly recommend others do it too. I like to keep to 1 commit per ticket (even a super large ticket that might take me weeks of development). But having a commit history like the fella in this story is nice too. So what you should do is work in a feature branch. On the day of deployment I'll squash, and cherry pick (in my case to a train). If the deployment goes bad, and my code is at fault, then all that is required to fix it is a single revert.

  If you want to continously deploy to production (with real users using it) you need a process to very quickly revert bad commits from it.  At my last company with hundreds of developers, we would push to prod several times a day.  Each deployment would have several devs changes included.  A clean linear history is essential to making that process work.


You get a similar benefit (a single revert to revert it all) via "git merge --no-ff BRANCH". If the BRANCH is a fast-forward (i.e. it's been rebased to master/main & tested before merge) then you get both benefits of a clean history and an easy revert, for little downside.

Keeping the history of each incremental change, even in a branch, is IMO too useful to give up.

Note I'm not advocating keeping those "fixed typo in previous commit" fix-up commits: those should be properly fixed up _before_ merge, by judicious use of git rebase.


Wait, I must be missing something. I too develop in feature branches, and then they're merged into develop or main (with a merge commit, obviously). Reverting that is too only a single revert. The only problem is that sooner or late those feature branch do get deleted, and if you swashed, you lose all the non-squashed history. If you don't, it's still in the develop/main branch.

What do I miss?


Personally I think of working in a branch as the process of creating a patch set. A patch in a series should only depend on its predecessors. You can submit more than one patch in a pull request.

I would never send out a patch set for review that includes all the "oops" "typo" "iteration 50" etc. commits I make as I work on code. Those are pure noise.

However, git is a tool for development, and during development I should be able to use git in whichever way is convenient for me: commit, rewrite and do whatever the hell I want with my local history. Not having that freedom is the primary reason why working with most non-distributed version control systems is such a pain.

When it comes to actually merging patch series to master, I like not doing fast-forward merges, since you maintain a natural grouping of the applied changes.


At my workplace we generate ~150 commits every week or two. There are a lot of trash commits and the history is completely unreadable.

I'm sure that's barely any compared to large companies, so I really question the value of commit history unless message guidelines are well enforced.


We go through ~600 commits per day. In my experience, there's nothing worse than looking for when a value was changed from 10 to 100, and finding it in a "Merged from X" commit with the history of why the value was changed from 10 to 100.

I don't often (ever?) browse the history, but I _do_ regularly search the history using tooling, (git log | grep <something>), and more is more in that case, even if the history isn't perfect.


That just sounds like a bad squash.


If the original commits are: - Change X to 10 by Foo - Change X to 100 by Bar - Change X to 50 by Baz

And that gets squashed into `Change X to 50 by Baz`, you lose the context as to why Foo and Bar changed it, and the values. If I need to go investigate an issue with X, I'd rather thave the history of all the changes.


Commits by multiple people should rarely, if ever, be squashed.


>Do people out there actually squash commits?

Depedends on the git flow being adopted by the commiters.

HNers constantly espouse how clarity is more important than cleverness. It may be the case that a single commit offers more concision than multiple commits which could be perceived as just more noise.


I guess It's my development culture is flawed, but as one-man-team I always time constrained. So I sometimes don't have sufficient time to write too detailed commit messages.

So I end up with "one commit per feature" rather than "one commit per logical change". So I actively using squash during interactive rebase when I prepare to merge completed branch since my commit history sometimes looks like this:

   Backend: new set of APIs for XYZ
   Backend: implement feature X (+ some long comment)
   Frontend: implement feature X
   fix for feature backend
   fix for backend API XYZ
   front fix
   backend fix
Yeah again I know I could do better, but yeah I use squashing for this reason. A lot.


Well, uh, yes, that's what the commit history generally looks like (although we do generally put "JIRA-9999: " in front of all commit messages, for context). So what?


What I wanted to say that since I don't have that many people other than me looking into my code there no reason to preserve real development history of every feature and IMO 3 "feature commits" are preferable than my development mess of 50 "fix that fix this" commits.


> Do people out there actually squash commits?

Yes, squashing parts of a discrete piece of work together, where it makes sense, then makes it trivial to git bisect any future problems.

Code committed to the mainline should always compile, and should be made of discrete changes. However, do what you like on your unpublished local branch.

> What's even the point of it? It's not like people routinely read the commit history, and when they do, they really would like a complete story, not 20 gargantuan commits that contain 3 years of development.

My company routinely reads the commit history. The commit is usually the 'why', and the code is the 'how'.


It's a mix where I work. Most people aren't at all tidy with their commit history, and I'll often see a PR with a fairly small final diff (maybe a couple hundred lines changed total) with several useless one-word commit messages like "fix".

And then after the PR has gone through review, most people tack on extra commits with equally-useless commit messages like "addressing feedback".

It's infrequent that I see PRs with commit histories that actually chronicle the history of the change itself; it's more a chronicle of the developer's changing thought processes as they try different things, go down blind alleys, change approaches several times.

Most of the individual commits have test suite failures and some of them don't even compile, so it's impossible to bisect across them if an issue is later found.

In those cases, I wish people would just squash (and some do, where I work). Yes, you lose information and separation, but I'd rather have one large working commit than 15 small broken commits followed by one working commit. Ideally people would curate things before submitting their PR, but I've found that most people just don't care, or don't understand git well enough to even attempt to do it.

Sometimes I toy around with the idea of trying to teach people (what I consider) better practices, and insist they are followed, but we all have a limited amount of social capital in our workplaces, and I'm not convinced this is something worth spending it on.


I religiously squash my commits for each merged pull request.

I seems like madness any other way to me... why have a bigger granularity than a single merge commit?


Interesting, it seems like madness to me to have it the other way around. Why bother with a merge commit if all the changes are squashed into one anyway? Why not just fast forward merge as if it was a commit directly to the main development branch?


Granularity helps with bisecting issues.


at least in github after you find the squash commit equivalent to the PR you can restore the branch and bisect the rest in there. This is much faster!


I do read the git commit history, and specifically to find problems like these. Squashing commits indeed sounds like a terrible idea to me.

There are also people who insist you always need to rebase your commits before pushing, in order to get a nice, linear history, and again I disagree. It's fine when you're rebasing a very short (single commit) history, but for a long history, it not only gets very tedious, but when it introduces new bugs halfway through that history, you may not notice. You will notice later, and then tracking that bug you end up in the middle of your own (rebased) history and you may wonder why you ever did something so stupid, when the actual cause of the bug was the merge of two histories at the end of your work.

I do rebase sometimes, but only when the history is short and I can easily see what I'm rebasing. A history longer than 2 commits should not be changed.

I never squash.


Yes, our company squashes commits (we have tens or probably hundreds of thousands on the main branch). Yes, I routinely look at commit history. Commit history is also used for doing analytics for performance reviews such as how many commits did you make, what percentage had tests ect. For instance my last performance review made note that I had test coverage on 90% of my commits.

Also, we auto-stamp the PR on the commits so we can get the context. I don't understand how anyone could make sense of a large codebase when all the commits are added with the standard inane commit messages (e.g. "fix it", "typo", "name changes", "tests") that people do when building a feature. I routinely have to look at a piece of code, do a git blame, get the PR in the commit and figure out what was being done.


I squash, but it's not something I do regularly.

In my case, it's generally because I'm preparing training material.

When I do that, I have two branches: dev and master. master is the one that is exposed to students, and dev is the one I use for prepping, testing and staging.

Sometimes, I may even have the dev branch in a private repo, or one that is associated with a different GH ID, like so: https://24ways.org/2013/keeping-parts-of-your-codebase-priva...

The main "gotcha" for me, is to make sure that I merge the master back into dev, after doing the squash. Makes life a lot easier, for the next squash.


Yes I do (well, I rebase, not squash blindly), but I do read the commit history. Or well, not so much the complete history, but I do go looking for which commit introduced a particular line, what message went with that change, and what else changed with it.

Of course, that only works after you leave proper commits. The reason people don't do that is because you only look at your commit history if you've kept it useful, but if you've never looked at your commit history because it's not, you don't know the benefits of keeping it useful.


From experience of working with new CI/CD systems where one doesn't quite understand what's failing due to ... reasons.

It's really quite easy to get into the 50s or close to 100 commits in a branch, which can lead to a horrendously messy history.

There are also people that like to commit before the try something locally and their branches are effectively a mess.

All of the above is magnified in a monorepo environment, where you might have thousands of commits a day against master if it weren't for enforcement policies that force a squash.


That's not how it works, really. A simplified workflow:

You take bug or enhancement of a couple of points, work on it on its own branch, then merge req+squash the mostly noise commits into a develop/master at once.

A chunk is approximately 3 days, not 3 years. For forward-looking projects, it saves time and works well. "Enterprisey" projects maintaining legacy branches are less well served.


It depends on the situation. If I'm merging a feature branch that has a lot of commits that effectively make up one feature (because a dev had to go back and forth or because there was a lot of feedback), I might squash those commits into one before I merge it into master.

As mentioned elsewhere in the comments here, that also makes it a lot easier to revert said feature if something goes wrong.


We do. PRs are generally smaller than 3 years of development. Sometimes refactoring will be split out so what could be a single PR with multiple commits instead becomes multiple PRs. This allows review to be focused on each PR & enforces that each commit maintain CI passing


I always squash my commits and at my current place that is even enforced via phabricator


Phabricator has a quite different (I would say better, others would say worse) development flow philosophy to GitHub/PRs.

Phabricator’s preferred model, which is heavily influenced by Facebook, is to forgo feature branches entirely and just stack many small changes on top of each other, landing as and when you want (this doesn’t preclude you working on feature branches locally, of course, because Phabricator doesn’t care what your local checkout looks like).

Because of this, Phabricator considers each diff to be discrete, and if you have multiple changes making up a single feature they should in turn be broken down into separate diffs.

Personally, I think stacked diffs are the killer feature of Phabricator. Unfortunately I haven’t been able to find a similar flow with PRs (recently we migrated from Phabricator to GitHub for one of my projects), you end up fighting against the tool a lot.


Stacked diffs have been a pain for me as well, but recently I found a tool that makes it super easy to implement stacked diffs on top of GitHub! I started using it a month ago and it makes complex code changes so much easier to split up into manageable chunks.

It's called ghstack (https://github.com/ezyang/ghstack)

If you want to learn more you can email me at ericyu3@gmail.com. Also happy to help you get it up and running - just put some time on my calendar at https://calendly.com/ericyu3/15min


You can always land with `--merge`. That is what I do if it actually makes sense to preserve individual commits.


The only time I ever do it is if it's trivial. Say I've removed some vestigial code, then I find some more after I've committed.


Yes they do. And it is frustrating. Specially frustrating when you ar the new guy and the history of the code does not tell what happened.


I don't think having a history full of "Fix the shaver, maybe yaks don't need 6mm trim" with subsequent "Fix shaver again, yaks need as low as a 3mm trim" with some more intermediate commits help understanding what happened either.


That's much better than looking at a file in a 300 file commit and seeing "merged from XXXX" with no information as to why that one line was changed. I'd much rather spend 30 seconds parsing through the 10 yak shaving commits than have to go trawling through old commits on a file to find the most likely owner of it to ping on slack.


You’re creating a false dichotomy. When people say “squash your commits”, the don’t mean “squash your entire repo into one commit”, they mean “get rid of you ‘typo’, ‘typo fix’ commits”. Your commits should still be small and self-contained.


Actually I think you're creating a false dichotomy. How many people are commiting dozens of "typo/fix build" commits and then squashing just those? In reality people are squashing the iteration process of "add an X, add a Y, remove the X because it didn't work, and add a Z instead of an X and Y" into "Add a Z"

If you're simply talking about removing "fix typo" commits, then just don't. Just ignore them. You don't need them, but someone might. They're not hurting you?


When that happens, I usually stick in a comment about it. Generally anything worth committing gets merged anyways so you still get the "Revert X" commit in there too.


That's not what would happen though, by default the resulting commit message contains the messages from all those squashed into it.


yep exactly this. I found early on while having to work with perforce and using timelapse view (git blame but with a nice gui) that lots of intermediary commits like this generate enough noise to discourage even the most determined sleuth from using commit history to solve a crime.

Well scoped and well sized commits, squashed, and in my personal preference rebased, provide a commit history that's segregated on actual tickets/stories/features/fixes that I find are navigatable very far back.

I'd say squash usage is very much case by case. But yeah equating a feature like squash to large commits is a fallacy. It's a useful tool imo that can be misused, but that doesn't mean the tool is "bad".


I feel like any discussion of squashing or not is only half the story - people need to write good commit messages. A lot of people I've worked with commit too granularly for that single commit to be useful, so squashing gets things grouped more usefully. However, if they still make low effort commit messages, then the history is useless. But it always was useless, squashed or not!


Depends on the Git craftmanship of the team is my experience. If they commit every small fix with a useless commit message, history becomes messy real quick. Squashing can be seen as a solution here. Until you learn proper rewriting of history as in (interactive) rebasing. After which you can put you code changes in every commit you want and order them around as you see fit. But that also depends on the Git workflow that is used and how much branches are shared amongst team members.

I had one job in the past that used Gerrit[0] as Git tool. One of it's features is that it creates a "pullrequest" for every commit in the branch you push. Which needs to be reviewed individually. This is really anoying if you're used to organising your work in lot of commits to record each step of your developerment. But from a project's Git history perspective it makes a lot of sense. As every commit is 1 change, one feature, one contained unit, that is added to the main branch. So instead of the main branch now containing countless commits with each developers complete history on a specific feature (where code is added in one commit to be removed in the next) it contains the features as distict commits, making them easy to bisect and revert if needed. This looks a lot like squashing but because you do it before you push your code you learn to put much more thought into that single commit and the commit message.

[0] https://www.gerritcodereview.com/


depending on what im doing, i'll do squashing, fixups, and commit reordering in an interactive rebase prior to merging to at least reduce the number of noise commits.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: