Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The point of rebasing for clarity, IMHO, is to take what might be a large, unorganized commit or commits (i.e. the result of a few hours good hacking) and turning it into a coherent story of how that feature is implemented. This means splitting it into commits (which change one thing), giving them good commit messages (describing the one thing and its effects), and putting them in the right order.

Rather than hiding bugs, usually I wind up finding bugs when doing this because teasing apart the different concerns that were developed in parallel in the hacking session (while keeping your codebase compiling/tests running at every step) tends to expose codependence issues that you wouldn't find when everything's there at once.

It's basically a one-person code review. And when you're done you have a coherent story (in commits) which is perfectly suited for other people to review, rather than just a big diff (or smaller messy diffs).

It also lets me commit whenever I want to during development, even if the build is broken. This is useful for finding bugs during development as you'll have more recorded states to, i.e., find the last working state when you screw something up. And in-development commits can be more notes to myself about the current state of development rather than well-reasoned prose about the features contained.

I realize not everyone agrees with it, but I hope I've described some good reasons why I think modifying history (suitably constrained by the don't-do-it-once-you've-given-your-branch-to-the-public rule) is a good thing, not something to be shunned.



I agree with you, but only for local commits that haven't been pushed to a shared repo.

Rewriting local history seems no different than rewriting code in your editor.

Rewriting shared history is (almost) always bad.


I like "Rewriting local history seems no different than rewriting code in your editor", that's a pretty good analogy I hadn't thought of.

There are a (very) few instances where you'd want to rewrite something pushed to a shared repo. One is if there's a shared understanding that that branch will be rewritten. Some examples would include git's own "pu" and "next" branches. "pu" is rebased every time it changes, and "next" is rebased after every release. Everyone knows this and knows not to base work off these branches. There's also the occasional "brown paper bag" cleanup like some proprietary information got into the repository by mistake and all the contributors have to cooporate to get it removed. But all of these take out-of-band communication somehow.


We've been fine using rebase on already pushed branches. This comes from the understanding that a feature branch belongs to one developer, ever, and that no one else is supposed to work off of it (or at their own peril).

Everyone knows that it's "my branch" and that they're absolutely not supposed to use it for anything until it's merged back into master or whatever authoritative branch.


If you're having a person own a branch, implementing it in their fork would probably make more sense: https://www.atlassian.com/git/tutorials/comparing-workflows

We using the forking model for bigger projects with more developers, and the branching model for smaller projects. It works out very nicely.


Ok, that makes sense... but then why bother pushing the branch in the first place?


For me, it's because I hop between development machines, and pushing/pulling a branch is much easier than the alternative of synchronizing files manually among said machines.

Also, so that if something goes awry with my dev machine for whatever reason, at least my work is saved.

Also, to make it easier for a colleague to review my code before it gets merged into something.

Also, becuase it means I can use GitHub's PR system instead of doing it on my machine (thus providing some additional record that my code got merged in, and providing an avenue for the merge itself to be reviewed and commented on).


We have a rule that you never go home at night without pushing your work, even if it's garbage. Put it in a super-short-term feature branch if needed, and push that, but don't leave it imprisoned on your machine.


There are people who follow this rule, and there are people that think disk failures are what happen to other people.

Few things sting as bad as loosing hours or days worth of work.


And there are people who have good backup systems.


It allows builds off of that branch, so you can get test feedback etc. It also acts as sort of a backup or a sync if you switch machines.


Code reviews -- you can create a PR on the pushed code, make fixes in response to the comments, rebase, and re-push.


I work on multiple machines. Pushing my branch up even if it's busted code means I can continue work easily on other computers.


Immediate backup

(I hope I'm not alone in saying this...)


I kinda hope you are, because backup and source control really should be separate functions. Obviously your source control repository should be backed up, and pushing stuff into it acts to create a backup, but you really should have a separate backup system at work as well, to cover unpushed code as well as all the other useful info contained on your computer.


I use it the same way too. I do not really see why backup should be separate from source control as there is no valuable information on my (work) computer apart from the source code, and I never spend more than a few hours without pushing.


Backups of your work computer would close that hours-long window between pushes.


You are not


Does anyone advocate rewriting shared history? Oddly I see this "exception" a lot in reply to this person but I'm not sure I ever read anywhere anyone saying rewriting shared history is a good idea.


I think its less people saying you should rebase shared history, and more people saying you should rebase without realizing shared history matters. Then some poor confused soul starts always rebasing before pushing/merging and they mess up their local history and do not know how to fix it.

A lot of git is "magic" to many developers, and the way that rebase works is certainly one of the features poorly understood.


Only in extreme circumstances where something sensitive (such as credentials) or otherwise (such as other people's copyrighted assets, or .svn directories in the case of some repos that were moved from SVN to get in a hamfisted manner) was checked into the repository and needs to be removed. Those are the only reasons for rewriting shared history.


My rule of thumb is that rewriting shared history is always, always bad. There may be situations where the proper precautions can mitigate the risk, but I've never seen a good example where it's actually a completely good idea without downsides.


> I agree with you, but only for local commits that haven't been pushed to a shared repo.

Yes, that's why Git doesn't allow you to push rewrites, at least not without '--force'.


> Rewriting shared history is (almost) always bad.

Agreed. The one counterexample that I have is Github pull requests. Those are actually branches in your fork, and you do want to rewrite those when you get feedback on a pull request. That makes it easier for the owner of the repo to do the merge later.


Why do you need to rewrite? If a pull request is not completed, you can continue to push it and the PR is updated to pull the latest commit.


I will get pull requests where later commits fix bugs introduced in former commits.

I generally ask people to rewrite such PRs, as I’m not going to pull known buggy commits into master, even if they are followed by fixes. That is just noise.

It might also be that some commits in the PR has changed tabs to spaces or vice versa.


I think the point was: if you have a PR with two commits, you can squash it to a single commit and force push. This will update the PR to just have the single commit. (Similarly with a rebase.)


sorbits' point was in response to:

clinta > Why do you need to rewrite? If a pull request is not completed, you can continue to push it and the PR is updated to pull the latest commit.

sorbits is saying that no, you really should rewrite your PR.

You, hayd, seem to be merely reiterating sorbits' point.


Making 'temporary' commits and rewriting local history before pushing to a shared repo has analogs in other revision control systems:

* In Subversion, people track patches using tools like quilt to manage them before actually putting them together into a commit.

* In Mercurial, people use `hg mq` which is like a more featureful version `git-stash`.

These are basically all ways to track a series of patches prior to 'committing' them into the code base shared with others.


Speaking of `git-stash` I've always thought of `git-stash` as a less featureful version of `git-branch stash`


I don't think I've ever seen anyone advocate rewriting shared history.


I've came across reasons, but they've always been pretty marginal, such as somebody checking in sensitive credentials without realising what they were doing.


I think I would like the ability to edit commit messages for typos without having to force everyone to reset --hard.


The thing is, the commit message is part of the commit, not something separate from it. Irritating as it might be, this is good for traceability.

What I do to avoid that is work on a separate branch, rebase against master, then review the commits on my branch after getting rid of any WIP commits and shuffling them around to make more sense. Finally, I make sure the commit messages are (a) accurate and (b) have no typos. Once I'm satisfied with that, I merge.

I treat merging as a big deal, but not committing.


Agreed, most people hear "rewrite history" and immediately assume "public history".

Rebase is a part of code review. If someone spots a typo and a "fix typo" commit follows it up as happens for a good proportion of GitHub model projects, I cringe. This information is uttery useless to the projects history, and should be rebased as a fixup. Only once code review is done, should a commit be considered for merge. It's at this point that rewriting becomes a problem.

I think most people forgot where Git came from, git is designed from the ground up for this! When someone emails a series of patches to the kernel mailing list for review, they iterate that series of commits over and over until its ready. They don't keep adding new patches on top like the Pull Request model proposed by GitHub/GitLab etc do.


In my Github experience, rebasing/tidying your commits is expected before a Pull Request is merged, just like your description of Linux development. Eg, the numpy/scipy/matplotlib projects.


Unfortunately, this is not true for many repositories. GitHub's interface (i.e., the "Merge" button), encourages users to merge from the web interface, where this tidying can't happen.


Then someone else rebases over that commit, there's a conflict and lo! the tests fail. Why? typo. It's fixed in the subsequent commit (which you can't see). Lovely.

There's something to be said for having every commit pass tests/work (or if it doesn't saying explicitly in the commit message), if anyone is ever going to step over this commit.


That's a hard one; trying to make a single commit in a pull request helps me but sometimes even then a pull request gets ignored and they want me to rebase it.

The problem is they ask /me/ to rebase it; I think they should take a little ownership in the potential rewriting of history.


There's no potential rewriting of history before they merge your pull request, only a series of unaccepted draft commits. :)


Another nice side benefit is that you are able to use git bisect to find bugs more easily. If some of the commits fail the build then it becomes difficult to separate commits that actually introduce a bug from those that are just incomplete.

The team I work with has recently started making sure every commit passes the build and it's had some fantastic results in our productivity. We know every individual commit passes on it's own. If we cherry-pick something in that it's most likely going to pass; so if it fails then usually the problem is in that specific commit, not one made days or weeks ago.


You don't have to rewrite history to do this. You just have to run your tests before committing. You know, like people used to in the old days.

Indeed, i think the widespread rewriting of history that goes on in the Git world makes it more likely that there will be failing commits, because every time you rewrite, you create a sheaf of commits which have never been tested.

Now, in your case, it sounds like you have set up processes to check these commits, and that's absolutely great. Everyone should do this! But why not combine this with a non-rewriting, test-before-commit process that produces fewer broken commits in the first place?


Running test before committing locally adds a lot of friction. It often happens to me that I work on a feature in component A, and in doing so, realize that it would be great to have some additional feature in component B (or perhaps there's a bug that needs to be fixed).

As long as the components are logically separate, it's usually a good idea to make those changes in separate commits. While you can do that using selective git add, I personally often find it more convenient to just have a whole bunch of rather small "WIP" commits that you later group and squash together in a rebase.

Not least of the reason is that I like to make local commits often in general anyway, even when I know that the current state does not even compile. It's a form of backup. In that case, I really don't want to have to run tests before making commits.

And obviously, all of this only applies to my local work that will never be used by anybody else.


When you come up with the idea for a feature in component B, or a bug to fix, rather than implementing it, make a note of it, and carry on with what you were doing. Once that's done and committed, you can go back to the other thing. That way, you end up with coherent separate commits, that you can test individually as you make them, without having to rewrite history. Not only that, but you can give each commit your full attention as you work on it, rather than spreading your attention over however many things.

Again, this is the traditional way of doing things (as an aside, in pair programming, one of the roles of the navigator is to maintain these notes of what to do next, so the pair can focus on one thing at a time). Seen from this perspective, history rewriting is again a way to cover up poor, undisciplined programming practice.


It's possible that we just have different styles of working.

Still, to clarify: Not all, but some of the situations I have in mind are situation where the changes in component A cannot possibly work without the changes in component B.

So an alternative workflow could rather be: Stash all your changes made so far, then do the changes in component B, commit, and then reapply the stashed changes in component A. That's something I've tried in the past, and it can work. However, it has downsides as well. In particular, having the in-progress changes in component A around actually helps by providing the context to guide the changes in component B. So you avoid situations where, after you've continued working on component A, you realize that there's still something missing to component B after all (which may be something as silly as an incorrect const-qualifier).

It's also possible that our preferences depend on the kind of projects we're working on. What I've described is something that has turned out to work well for me on a large C++ code base, where being able to compile the work-in-progress state for both components simultaneously is very useful to catch the kind of problems like incorrect const-qualifiers I've mentioned before.

I could imagine that on a different type of project your way works just as well. For example, in a project where unit testing is applicable and development policy, so that you'd write separate tests for your changes to component B anyway, being able to co-test the work-in-progress state across components is not as important because you're already testing via unit tests.


I agree that the situation where you need the changes in B to make the changes in A is both genuine and annoying!

I have often taken the stash A - change B - commit B - pop A - finish A route. If you know what changes to B you need, it's fine, but you're right, the changes to A can be useful context.

In that case, you can make the changes to B with the changes to A still around, then stash A, run the tests, commit, pop A, and continue. Then you can have the best of both worlds, and you still don't need to edit history.

If you just can't make the changes to B without the changes to A, then they probably belong in a single commit, and you've just identified a possible coupling that needs refactoring as a bonus.


Yeah, obviously we do that (well maybe not so obvious to some, but I never push unless the tests pass). We sometimes perform lots of other things like static analysis that get in the way of a rapid feedback loop. We also run mutation testing, which can sometimes take several hours for the whole codebase -- although we don't have this run on every commit, just ones that we merge into a specific branch.

The problem I have with non-linear commit history is that I find it impossible to keep all the paths straight in my head when I am trying to understand a series of changes. Maybe you can do that, and I think that's awesome, but I like to see a master branch and then smaller feature branches that break off and then combine back with master.


A tool that does not naively sort the commits by date but groups linear parts of history together should allow for better overview.


Maybe, but testing does not prevent all bugs and what happens once bisecting is needed still needs to be considered.


> The point of rebasing for clarity, IMHO, is to take what might be a large, unorganized commit or commits (i.e. the result of a few hours good hacking) and turning it into a coherent story of how that feature is implemented. This means splitting it into commits (which change one thing), giving them good commit messages (describing the one thing and its effects), and putting them in the right order.

To my understanding, Gerrit does grouped commits as part of the flow. Even better, groups all review-triggered commits under the same master commit, with the nice, extensive description that one carved for the PR. It's regrettable that GitHub popularized fork/pull request model instead.

https://www.gerritcodereview.com/


> The point of rebasing for clarity, IMHO, is to take what might be a large, unorganized commit or commits (i.e. the result of a few hours good hacking) and turning it into a coherent story of how that feature is implemented.

Isn't this the same rationalization that drives Git Flow's feature branches and merging via --no-ff ? You can see the messy real work in the feature branch, but it gets merged to the main branch as one clean commit.


Once the merge commit occurs, the 'messy real work' is now part of the main branch's history just as much as the rest of the commits, as they are ancestors of that merge commit.


same here. it is much more clear to me to reapply my commits, as long as I constrain myself to clear, coherent and atomic commits.

replaying changes is much more comfortable to me, especially when I have them in shot term memory, surely easier than merging other people stuff within your files

my average feature is around 7-10 commits, all replayed on latest commit on the branch. it forces me to catch up with other people work on shared areas and gives me quite some more confidence that merge isn't messing up with problematic files.


Precisely.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: