Modern Git workflow is very simple on its own: 1. Your codebase has a `main` bra...

P5fRxh5kUvp2th · on Nov 10, 2022

It's flat out amazing how often people want to complicate git.

My current company, everyone is in love with git flow. I pointed out the person who originally created it literally wrote a blog post explaining why it was designed for very specific needs that most projects don't have, and yet ...

I constantly see people doing presentations on how this is all supposed to work. And why? You can get away with very simple flows.

pydry · on Nov 10, 2022

People love bikeshedding.

Git is also somewhat symbolic: you're a "real" programmer if you use it and not if you don't. I guess an advanced use of git makes you "realer". I think there's an element of "tidiness" to it as well.

It's definitely cultural more than technical. More tabs vs spaces rather than static vs. dynamic typing.

xenomachina · on Nov 10, 2022

I agree with most of this, except:

> 5. When merge happens a dev's PR is squashed into one commit that gets appended to `main`

I don't know if GitHub supports this, but GitLab has a semi-linear history feature. When enabled, it won't let you merge unless a fast-forward merge is possible, but it never does a fast-forward merge.

This give you (IMHO) the best of both worlds: your history is pretty linear and easy to reason about. The "first parent" of each commit on the main branch is a linear chain of the main branch history. However, for more complex changes you don't have to squash. Those intermediate commits will be on the second parent of merges, so they're easy to filter out:

    M─┐ [main] merge
    │ o commit
    M─┤ merge
    │ o commit
    M─┤ merge
    │ o commit
    │ o commit
    M─┤ merge
    │ o commit

> If you want interactive rebase then make 2 PRs.

What do you mean?

throwaw20221107 · on Nov 10, 2022

What do you mean?

Dev 1: "I would like my PR to make 2 commits into `main` instead of the 1 that everyone else gets via squash."

Dev 2: "Ok then make 2 PRs."

chociej · on Nov 10, 2022

It's reasonable to want a series of commits (or patches) to be pulled together, assuming they are well-organized commits.

ratorx · on Nov 10, 2022

I prefer to not squash, because it loses useful history, but craft the commits in the PR instead.

A small PR might be 1 commit, but splitting the commits for a bigger PR makes it much easier to write good messages for each change. This is useful in future when refactoring and interacts better with e.g. git blame, rather than having 1 big commit that changed files in numerous places.

This is also useful if a PR is broken. It’s easier to revert a simpler commit (if possible) and just fix that, than redo the entire PR with the fix.

throwaw20221107 · on Nov 10, 2022

>if a PR is broken, revert a simpler commit and just fix that

I guess it differs between "library development" and "service development".

When you're developing a service, what's in `main` is constantly being tested under the production/customer load. So reverting the known bad PR is a faster fix. And the bug is affecting your customers, so you fix fast and cleanup later.

Typically in industry if a PR is broken you revert the whole thing as fast as possible. But most projects in industry are services.

However when you're developing a library you'll probably release via tags. So if bad code makes it into `main`, it's not a huge deal, and it might be clearer to just revert the one (child) commit of the PR. Because there's no time crunch, no bug in prod, nothing affecting customers.

>but craft the commits in the PR instead.

The problem with not enforcing squashing is trust. Think about the worst dev on your team:

Given that trust, are they gonna split their PR into useful commits? Fuck no. they're gonna merge into `main` a bunch of commits saying "fix typo lol xd" and "lol I suck".

If you give them the power to not just make 1 commit per PR, but N commits per PR, are they going to fill up your team's history with crap? Absolutely they will.

Squashing is a useful gatekeep in that way, to prevent one person from screwing over everybody else and making it harder to `blame` to rootcause a critical outage.

ratorx · on Nov 10, 2022

I work on services too. Typically you don’t want to revert the entire PR, just the bit that turns it ‘on’. If you make this bit as small as possible, it’s often simpler and easier to inspect the correctness of reverting a flag flip or a config option change than it is to revert actual code/config. Reverting a ton of code is what you do when you don’t need it anymore. It seems like a bad idea to do it in a hurry (unless you have no choice, but that’s a separate point about designing features that have this property and making it a strict requirement during code review where possible).

Then again, reverting commits in your source tree seems like a poor fix to the problem in any form. What you really want is rolling back to the last known working version (which you hopefully can do without needing source tree changes), rather than revert one of n changes, that may depend on each other and rolling forward.

Especially if you are building a new feature that is riskier or bigger, you will put the main bit of code behind a flag/config option (or perhaps even multiple to incrementally test larger and larger bits).

throwaw20221107 · on Nov 10, 2022

Sure. I'm just talking about binary deploy. Of course you're gonna use flags to flight things in an industry context, depending on the company and its CI/CD practices.

Let's say the binary deploy (before flighting) is fucked somehow. Then what do you do?

ratorx · on Nov 10, 2022

Well, if rollbacks (which shouldn’t require affecting the source tree) are a thing then that. Otherwise you’re right and code needs to be reverted and hope it works.

But in the 2nd case, I’d make sure to increase the priority of having known good rollback versions available (and rollbacks performable) and also carefully consider what CI/CD could be added to catch more broken binaries (e.g canary or staging if it’s important enough) and what code review practices could have prevented it.

throwaw20221107 · on Nov 10, 2022

Ok I agree, you roll back to the known working version. The easiest way to do that is revert the whole PR (or data deploy in case of flags, ofc). My point is not "flags vs. no flags". My point is "each PR should generate one commit because that's easy to revert".

The commit dag of git is a cool feature but shouldn't be in `main`. It's so much easier to work with a linear history and one where each commit contains all the required context to figure out "could this have broken something".

ratorx · on Nov 10, 2022

I don’t think reverting the broken code should be on the critical mitigation path at all.

Imagine a scenario where a bug didn’t immediately cause an issue or where your release contains more than 1 new PR. If you suspect the latest version of the binary is broken, your first instinct should be to use a version that isn’t. Figuring out the change and rolling it back should come after the rollback, when you have more time to think.

Deciding whether to revert a change is tactical question. Often the issue will be because you tickled an unknown bug in a different part of the code. In that case, it’s a lot easier to fix forward than revert the code that tickled the big and go through the multiple steps of fixing the bug and redoing.

ratorx · on Nov 10, 2022

Linear history is good, and having multiple commits in a PR doesn’t prevent it. The only change is adding n (ideally well crafted) consecutive commits rather than 1.

throwaw20221107 · on Nov 10, 2022

The problem is who is doing the crafting. (And the approving.) At least squash based approach limits to 1 the number of commits an untrustworthy "crafter" can occupy.

ratorx · on Nov 10, 2022

I think most code review already requires good faith on behalf of the reviewer already.

I do see what you mean about untrustworthy crafter. If we want to preserve master history, then the damage of a bad commit chain is worse than of bad code (which can be fixed/undone).

However, I think that the truly adversarial case is rare (and an exception could be made and master history be rewritten in that case). In most cases though, hopefully your coworkers and not deliberately trying to sabotage the codebase :). And I don’t think the commit chain needs to be a work of art or anything, just mainly avoiding typo commits and similar, so it shouldn’t be difficult to do when approached in good faith.

throwaw20221107 · on Nov 10, 2022

> the truly adversarial case is rare

The problem is not adversarial, or due to malice. The problem is ignorance and expediency driven by a desire to push code and little incentive to cleanup your git history.

The easy fix is to squash PRs.

The hard fix is to enforce that devs become "crafters" and to define what is and isn't "good faith".

throwaw20221107 · on Nov 10, 2022

Ratorx, do we work on the same team at google? I feel like I've heard you before?

ratorx · on Nov 10, 2022

Possibly? I’m an SRE. I don’t think my position is too different from the SREs I know :)

ratorx · on Nov 10, 2022

Replying to the second bit of your comment:

Commits before they are in master are fungible. If you have a PR with bad commits, you should treat it the same as bad code and force a refactor of the history. That’s enforceable during code review. I’d even do the same if their commits were too big and could reasonably be broken down into smaller changes.

yrgulation · on Nov 10, 2022

Mainly agree except the squashing bit. Squashing means lost history, and you cant tell why a specific method or peace of code was written.

throwaw20221107 · on Nov 10, 2022

I'd argue "why a method exists" should be addressed with naming and javadocs, not in the commit message. Why split the meaning of the code between the code itself and the commit messages?

And if it's not possible to document inline, the PR docs or code review comments should address this. Then future onlookers can use `blame` to see the context.

cdaven · on Nov 10, 2022

Commit messages should describe the change, not the code. This explains why the code was changed from whatever it was before, but not what the code does.

This can be important information e.g. when troubleshooting bugs, since it could explain the developer's thinking. Like in Chesterton's Fence; why on earth would you do something like this?

nazgul17 · on Nov 10, 2022

How would you address bug fixes and all other type of changes?

cdaven · on Nov 10, 2022

When picking a git workflow, you should start with your system's and your team's requirements in mind. Don't just copy git-flow or GitHub's simplified version.

Not every team or system uses pull/merge requests at all. Some do pull requests to the development branch, and to master when releasing a new version.

In some teams, code review is the bottleneck, and a developer may have to create a new branch from a PR to keep developing while waiting for code review, then create a second PR that builds on top of the first.

This last workflow is the main reason why I'm opposed to squashing PRs. It just doesn't work.

Also, squashing my carefully composed commits with individual references to DevOps backlog items is insulting.

throwaw20221107 · on Nov 11, 2022

>merging a sequence of PRs from same branch is bad if you squash

For me in this scenario, merging `main` w/ squash commit back into PR2 branch after PR1 gets merged works here.

As an aside — if code review is the bottleneck, it's probably not a huge investment to do some git fenangling to get the above to merge cleanly.

But I don't see how this use case is a dealbreaker or even specific to squash...? Won't you have to incorporate/merge other devs' changes to `main` back into PR2 anyhow, regardless of whether PR1 was a squash, rebase, or full history?

rramadass · on Nov 10, 2022

Neat and Simple!