I was having a similar hard time to remember most of these (the remaining ones I just don't use often, so I still haven't quite grasped).
The single thing that made everything "click" together is that most things are just pointers to commits: branch names, HEAD, tags, all of them are pointers.
HEAD is pointing to the commit you're currently looking at
The name of each branch (e.g. `my-feature` points to the latest commit of that branch)
When you're on main and you `git checkout -b my-feature` then you have at least 3 pointers to the latest commit on main: `main`, `my-feature` and `HEAD`.
Every time that you make a commit on `my-branch`, then both the `HEAD` and `my-branch` move to point to the new commit.
"detached HEAD" means that the the `HEAD` (the commit you're looking at) is not pointed at by a branch.
The difference between tags and branches is that the tags point to a specific commit and do not move.
----
The other thing caught me out multiple times is that most commands seem inconsistent because git assumes default arguments:
`git checkout file.txt` is the same as `git checkout HEAD -- file.txt`
When you're on `my-branch`, `git rebase main` is the same as `git rebase main my-branch`
The difference is that the latter you can run it from other branches too.
----
Last but not least, when everything goes wrong, the single command that can take you out of any weird situation is `git reflog` which shows you all the commits that HEAD has pointed to.
Having said all that, I'm glad that git is acknowledging all this confusion and are implementing commands with fewer surprises and simpler interface as that will make it easier for newcomers to pick it up.
Or to use the name git gives to that concept, "refs." Thus reflog :)
Also, one thing that I've found hasn't occurred to most people using git, is that all the branch/tag/etc refs of your fetched remotes, are also refs, able to be referenced anywhere you can name a ref.
For example, if you ever want to say "I don't care what's on this branch, don't fast-forward or merge or rebase, just overwrite my local branch with what's on the remote!" then that'd be:
> Or to use the name git gives to that concept, "refs." Thus reflog :)
A command name that I read as re-flog for the longest time :D. I really wondered about the strange, strange name for quite a while before I bothered to look up what it does and found out that I should read it as ref-log and that it is, indeed, a very useful thing.
I had the same experience for the longest time until one day I'd have to "re-flog" my local git because I reset some commits that I shouldn't have. Turns out that flogging isn't a real thing in git let alone re-flogging the commits again.
Ah yes, the fact that git brings the remote stuff locally when you `git pull` and all the `origin/<whatever>` are just branch names is something that I realised only too late.
> HEAD is pointing to the commit you're currently looking at
HEAD pointer is pointing to the branch pointer (e.g. my-branch) which is pointing to the commit. (Except in a detached HEAD state.)
> Every time that you make a commit on `my-branch`, then both the `HEAD` and `my-branch` move to point to the new commit.
HEAD pointer keeps pointing at the my-branch pointer, and only the my-branch pointer moves to point to the new commit. But of course, when you now follow HEAD to my-branch to the commit, now you end up to the new commit.
> "detached HEAD" means that the the `HEAD` (the commit you're looking at) is not pointed at by a branch.
"Detached HEAD" means that HEAD is pointing directly to a commit, instead of pointing to a branch pointer.
You can have a detached head state, where both HEAD and the branch pointer point to the latest commit. If you use `git log --decorate`, for the latest commit it will show (HEAD, my-branch) instead of the normal (HEAD -> my-branch).
I love this explanation. One unfortunately confusing extra piece that some people might occasionally run into is that there are two types of tag. Most tags are what you describe: pointers (git ref) to a commit (git object) and nothing more. These are usually referred to as lightweight tags.
There are also annotated tags that can contain a message, have a timestamp, and sha, etc. These are proper git objects that behave a lot like commit objects, except they're still typically only referring to another git object (commit).
When you `git fetch`, git is asking the remote to walk a tree of objects — starting at the commit object that the ref points to — and deliver them to you, to unpack into your own object store.
Git could in theory do a lot with just objects — with the whole "data state" of the repo (config, reflog, etc) just being objects, and then one toplevel journal file to track the hash of the newest versions of these state objects. (Sort of like how many DBMSes keep much of the config inside the database.)
But git mostly isn't designed to do this. Instead, git's higher SCM layers manage their state directly, outside of the object store, as files in well-known locations under .git/. This means that this higher-level state isn't part of the object-store synchronization step, and there must instead be a domain-specific synchronization step for each kind of SCM state metadata where applicable.
Tags are an interesting exception, though, in that while the default "lightweight" tags are "high-level SCM metadata" of the kind that isn't held in the object store; "annotated" tags become objects held in the object store.
(To be honest, I'm not sure what the benefit is of having "lightweight" tags that live outside the object store. To me, it looks like tags could just always be objects, and "lightweight" vs "annotated" should just determine the required fields of the data in the object. Maybe it's a legacy thing? Maybe third-party tooling parses lightweight tags out of the .git/ directory directly, and can't "see" annotated tags?)
Lightweight tags are simply references to commits that lie in refs/tags instead of refs/heads. Annotated tags are references to tag objects rather than commit objects. In both cases, the purpose of the reference is to give the object (tag or commit) a name.
I think no one uses lightweight tags anymore, except if you push by mistake a commit to refs/tags/something.
> Last but not least, when everything goes wrong, the single command that can take you out of any weird situation is `git reflog` which shows you all the commits
that HEAD has pointed to.
And if you're feeling extra-paranoid like me, a `git rev-parse HEAD` and copying that string down somewhere safe before embarking on a tricky process with lots of merge-conflicts or other shuffling.
It's nice to have confidence that I've accurately identified which state is the last-known-good one--not one a few steps too far into the chaos-zone--and that I can usually get back to if everything goes to hell. (Barring unwise use of stuff like `git gc` or `git filter-branch`.)
I use Stable Diffusion (A1111 webui) and sometimes run into config issues, sometimes untracked by git. Nothing more catastrophic than doing `git clean -dfx` by habit from tinkering with Debian packages and knowing that command actually resets everything that `git reset --hard` doesn't...
And then accidentally wiping out all of your checkpoints and generated images :')
Hardlink your checkpoints elsewhere, this way you don't have to keep them around elsewhere. Alternatively, make your models directory a symlink. (I don't think you can symlink the models themselves, I think that'll break it)
I think that this video should be considered mandatory viewing if you're a developer using Git — the whole lecture starts from the basic data structures involved and builds from there, as opposed to the way that it seems many people approach Git: "What command do I run?"
Git's UI is not meant as an abstraction at all. It is a set of tools to work on the underlying model. And the underlying model is actually quite elegant and understandable. I'd take that anytime over a leaky abstraction like subversion.
AND we should have pseudo-pointers to the current directory layout and the files already “add”ed (a.k.a. cached/indexed/staged). Git’s UI should allow us to do “diff” or any other command on these. I tried to get the git maintainers to name these (DIR and NEXT?) and make them work with the big commands, but they didn’t see the simplicity of it all. So instead of “git diff DIR NEXT” we get “git diff” and instead of “git diff NEXT HEAD” we get “diff —cached HEAD” which are much less understandable.
>the single command that can take you out of any weird situation is `git reflog` which shows you all the commits that HEAD has pointed to.
Can you elaborate on why that's helpful? I rarely get into a weird state with git but when I do it's almost always faster/easier to just delete the repository, re-clone, and re-apply my changes manually.
Whenever you get into an undesirable state, you usually want to undo whatever you just did. Sometimes it's very easy, e.g. if you created an object, to undo that you just delete the object. Sometimes git explicitly tells you what to do, e.g. if you are in the middle of a merge and you want to cancel it, it tells you to run `git merge --abort`. Sometimes it's not quite as easy, but still straightforward to google what to do, e.g. if you merged or committed to a branch, you just have to look up (or remember) the command to manually set what commit a branch points to.
However, in my experience the most difficult "weird state" to get out of is when you do something that removes/rewrites history. For example: deleting branches, rebasing, squashing, or accidentally getting rid of a reference while attempting to solve some other problem. The root issue is that you want to find a commit that seems like it no longer exists. If you rebase, the branch now points to a new commit that has ansestors you don't want, but the old commit is gone. The "secret" is that all commits that ever existed still exist, you just can't find them in `git log` because the pointers to them are gone. `git reflog` helps solve this by giving you a list of all commits that HEAD has ever pointed to.
git checkout cool-branch # HEAD points to commit abc123
git rebase main # HEAD and cool-branch point to commit def456, but you realize you don't want that rebase
git reflog # reflog tells you that the commit you were just on is abc123
git reset --hard abc123 # HEAD and cool-branch now point to abc123. You've Ctrl+Z'd the rebase
I just copy paste this and put it in my "break glass in case of emergency" folder. I have many snippets there. This is a good one. I usually refer to the log but reflog is really the "undo" for anything.
If you accidentally symlink the wrong file, would you find it easier to delete the whole directory and restore from backup, or just learn how to change a symlink?
A recent scenario had to do with me rebasing something to squash out some intermediate commits, and then I opened a PR, only to realise that some of the things I squashed I actually needed them.
With reflog I was able to go back before my rebase and redo it again, more carefully this time. (Unfortunately I couldn't just cherry-pick the thing I wanted)
In general it saves you from having to do what you described with delete/re-clone/re-apply things.
One thing that has really helped me understand git better is DAG (https://en.wikipedia.org/wiki/Directed_acyclic_graph). Whenever I do `git add <file/folder>`, it is somewhat easier to imagine how new blobs are created and how they are linked.
For beginners it is a fun exercise to understand why empty folders can't be added to git.
It's been about a decade since git "won" the version control war due to the (yet another) unjustified tech hype wave, and we're still having difficulties with what should be trivial tasks.
I remember reading a comment by Linus Torvalds being surprised that people started using git directly rather than putting a friendlier layer on top. If it was capable of introspection, the industry would admit it made a mistake by going with git and switch to another VCS instead, rather than wasting huge amounts of time on a tool whose only job is to save text.
I don't see git as a tool to save text, its for coordinating changes on the same 'text' by multiple people, and doing so quite precisely and reliably, without blocking anyone's path forward. Try that with word.
It could be better, yes. But I'm always surprised by the hate git gets. Its an amazing tool and miles ahead of the tools we created for non-developers (word, google docs, etc).
Maybe its because I'm old enough to remember when subversion was king.
> Maybe its because I'm old enough to remember when subversion was king.
Heh. I'm old enough to remember when CVS was king. Subversion was a huge improvement. Git was too. I have every reason to believe further huge improvements are possible. They might have to be gargantuan though to overcome inertia by now.
And CVS was a huge step up from RCS and SCCS synced via sneakernet, well, it seemed huge at the time, now it seems like a tiny step compared to everything that came after.
> It's a hyperbole, a figure of speech used to make a point. I'm pretty sure everyone on HN knows what git is used for.
I know. The hyperbole makes it sound as if gits job is actually quite simple and we could easily have a better system. But I disagree with that, I don't believe it is simple. Most other tools have and are failing still at this.
> Why would you compare it with SVN rather than other DVCS like Mercurial?
Because SVN ruled the world back then and that is what I and many other devs used. Many of us went from SVN to Git without even knowing what Mercurial was. That explains why I was (eventually) so in awe of git, had I gone from Mercurial to Git I might have lamented the loss of a more friendly system. I do actually remember doing the odd thing with mercurial and it being much smoother to work with, but then git was already becoming the dominant player.
But, be honest, git was better than mercurial as well.
And the reason that it's better is that the support is universal (now). Even back when the battle was being actively waged between git and hg, the popularity of git made it a better choice for pretty much everyone.
Did developers always have trouble with version control and merging? Yes.
In my experience most devs don't grok version control, period. It does not matter if it's SVN, CVS, RCS, Visual Source Safe, Clearcase, Perforce, you name it. Or now git.
Mercurial: I was not impressed. Yes it sort of works but if you know git, you just miss the fact that everything (tags and branches) in git is just a label. At least that's what I missed in the two years I had to use mercurial.
Fossil: Haven't used it but the main advantages touted in the doc you linked to aren't advantages to me. I.e. the only thing I'd want from it are the actual file versioning parts. And then I read things like
You can say "fossil all sync" on a laptop prior to taking it off the network hosting those repos, as before going on a trip. It doesn't matter if those repos are private and restricted to your company network or public Internet-hosted repos, you get synced up with everything you need while off-network.
Err, yeah, I `git pull` my repo(s) before going on a trip and it doesn't matter if they're private and restricted to my company network and I am synced up with everything I need while off-network. So?
Pijul: Sounds interesting from a first read but I'd take some time to actually read about it and try it out for real.
Veracity: `(C) 2010-2014 SourceGear`. I don't want to be that guy, but: really?
Yes, `git pull` and you have your repo synced. But guess why people are using Github (or alternative forges)? They want issues and maybe documentation/wiki etc. And Fossil has all those features too and that's the "all" you sync. ;)
Personally I subscribe more to the UNIX philosophy here, where I want my tool to solve one thing but be great at being combined with other tools in ways that the original authors probably never imagined.
E.g. I've used `git` in many scenarios.
I've used it with just one other developer where we'd push and pull from each other's laptops via ssh and no central server whatsoever. In that same place we used git for doing backups. We used Bugzilla for bugtracking with that.
I've used `git` w/ `git svn` for about two years in a company that used SVN and nobody knew I was even using and reaping the benefits of `git`, while they were having loads of issues with branching, tagging and conflicts. I showed some people. They wouldn't listen. IIRC we used Rational Clearquest or something like that in that place. It's been a while and it wasn't fun at all.
I've used `git` w/ a central server via SSH. No, not github w/ ssh protocol, just our own central server. Special user whose "login shell" was `git receiving data`. This is the Bugzilla place actually!
I've used `git` w/ a central server via HTTP. Funnily enough this was also the same place as with SSH but ya know, can't expose SSH externally, right? So HTTP it was via an Apache Proxy. Yes that long ago. So yeah Bugzilla place again. But also used this setup at other places that used Jira. Jira 3 mind you. Did we use GreenHopper there? I'm not sure any more. It's been a loooong time.
I've used `git` with bitbucket as the central server. Definitely with GreenHopper in Jira!
I've used `git` with `github` as the central server. Jira Cloud. GreenHopper has now for a long time been "Jira Agile" or "Jira Software" or whatever the nom du jour might be since they bought it.
Do you see the one constant in here? `git`! Because it does one thing and it does one thing well and you can combine it with all these other tools and environments and integrate with them. You don't have to convince anyone that "My tool's integrated issue tracking is the best, throw away your 20 years in tool X and migrate everyone and everything over". You "just" need to convince them, that of course `git` is better than Clearcase, which is uphill battle enough for one year.
Nevermind the wikis in use in those places. Don't even remember which where necessarily. All the way from the none, through Wikimedia, MoinMoin, Confluence and Jive (Jive leaves a particularly bad taste as it was in the Clearcase place and used as a company internal social media platform everyone was supposed to use for 'everything' :shudder:)
I've worked with Fossil for two years and half, and same things apply.
- I've used Fossil within a team of three and we were able to sync from each other's laptop without central server. We used it also for bug tracking, with no additional stuff to install.
- I've used in the same company with larger teams on different projects on central server over https or ssh.
- We've worked with github and other external central servers of Git repositories.
There's also one constant here: `fossil` :) It's a nice discovery for me, after Mercury (which I still use for my personal projects) and they are doing the job with ease (clean and less error prone interfaces.) I'm not selling you something and not trying to convince you they're better (I don't care as I can use them and interact with your git repos and will be transparent) ; alternatives were asked and i point some out, that's all. :)
Do tell. Your post is missing examples of better alternatives and why they are better.
FWIW, yes I have used others, maybe not the ones you have. I have used some of the ones mentioned in my list of examples, though not all but also others not mentioned.
Weekly, I witness co-workers confused about git, I see posts online asking for help, I see articles like the one here once again trying in vain to explain something that should be simple.
In all my time coding, I don't remember anyone wasting hours trying to undo some mess they'd made in SVN, TFVC, Perforce, etc.
Tools exist to make our lives easier. If they can't do that, they don't deserve our time.
Confusion over svn was standard operating procedure over here for years before teams started using git regularly. I think I'm still the only one here that really understands how svn merge and svnmerge.py work, and have had to detangle plenty of messes over the years.
We'd kind of standardized on trunk-based development because everyone was afraid of even attempting merges.
But git has a very slightly improved merge algorithm than svn (so they say) — but merge conflicts were less painful and less likely to happen on svn.
The only real advantage git has is bandwidth and disk space —- not that it uses bandwidth and disk space more efficiently —- only that we all have more of both, so it seems more useful than subversion (or whatever else you were using before) when you had limited disk space so you didn’t pass around infinite copies of everything.
I remember seeing people blow up their SVN projects fairly often, usually from trying to move folders around without telling it. Though I don't remember those turning into three developers hovering over one computer the way that git errors so often do.
It is a true power tool. It does need an initial investment to get over the hump but I would never go back to SVN.
It doesn't take a huge amount of time to really learn the basic concepts and then maybe the 10 most used commands (I suggest using git cat-file -p to navigate a repo and history manually). There is a wealth of functionality and an inconsistent terminology, but you can quickly look that up when you actually need it.
The worst thing is that GIT makes some people NOT do version management. To avoid dealing with GIT, they’ll go https://xkcd.com/1597/ and so suddenly major rewrites of code are suddenly one “pull - stage - commit - push - PR” affecting 100’s of files…
In 2013, it was far from clear that git would win. There was still a sizable amount of users of darcs, Mercurial, svn, and even cvs. Many of the tools of that era would have plugins to support all of them.
I wish we still had that, because git monoculture also means that anything that replaces git first has to reimplement git. This means that just like ASCII or scroll lock buttons, we're stuck with git mostly forever.
git won for good reasons, it's clearly better than what came before it. It may be popular to shit on it now (similarly for jquery), but when it arrived on the scene it was clearly an improvement.
The way I got into git was: I was working on a project using Subversion and wanted to be able to work on train rides (and internet connectivity on trains wasn't a thing back in 2007).
Initially I tried SVK, but I found that git-svn actually worked better.
In fact, it worked so well, I stopped using "svn merge" (which took 5+ minutes on our repository for every merge), and started using "git merge" with git-svn instead (which reduced the merge time to <3s, and even the extra git<->svn sync overhead cost only 30s or so). As a bonus, git also reported fewer "merge conflicts" (svn at the time had issues repeatedly merging from a branch to trunk).
So when I ended up picking a DVCS for another project, git was the natural choice since I already knew it. I imagine there are a lot of developers who started out on SVN and took a similar route to learning git, so having a high-quality Subversion bridge turned out to be one of the critical features on the road to adoption. This advantage in adoption then snowballed via forges like GitHub.
He talks about what SVN got wrong, specifically, that it made branching easy and it's the merging that's the important piece. Git won because it made merging easy.
> git won for good reasons, it's clearly better than what came before it.
You guys keep covering your eyes and ears, pretending that Git and SVN were the only players on the VCS market. That was not the case.
Sure, SVN had problems and people wanted something better, but Git was far from the being the best alternative for the average software development team.
> It may be popular to shit on it now
I was shitting on it 10 years ago, along with a small minority, and for good reason. Unfortunately, the hype was too strong, and we are where we are.
>It's been about a decade since git "won" the version control war due to the (yet another) unjustified tech hype wave
Hah, I've recently wrote a post about similar issue - why we may be locked with git.
>So what's the issue here? I'm worried that just because GitHub is so good, then unless they decouple from git as letters management engine and allow any/other, then we will be locked with git.
I think the parent comment is trying to say that not all project requires feature rich tool like Git. I also suspect for many many project subversion is good enough.
Just being able to create a local branch, experiment with some changes and then either scrap the whole thing or merge them back is a big productivity boost. I feel like with SVN making a branch was like a "big deal" and something you had to think about, whereas in git making a branch is cheap and just part of a normal work flow.
Making an svn branch is not a big deal and you can certainly create a branch in svn and scrap it. It seems that the git hype of 10 or so years ago was very effective.
Making an svn branch and scrapping it, indeed never was a big deal.
Making an svn branch and merging it, now that was a huge issue. "svn merge" sucked compared to "git merge".
For the project I was managing back in 2007 I started using git-svn, because importing Subversion commits into git, using "git merge", and then exporting commits back to the subversion server, was faster and worked better than "svn merge" did (back in 2007).
It's true that it's been a decade since I used SVN so I could be misremebering, I just remember branching and merging being a much larger pain than in git.
Branches were made by copying the directory tree, and svn doesn't have local commits so it always went straight to the server.
I think svn also originally didn't have merging, or at the very least it was such a bad experience svnmerge.py was created and really common to use. Even once it got good it was mostly the equivalent of using git cherry-pick to pull commits across branches, though it did get a special "reintegrate" mode for diffing a branch and applying that back to trunk (and I remember there being something about it being possible to accidentally undo commits if you weren't fully up to date when running it...?)
Edit: I remember now, all changes on trunk had to be merged into your branch first, or reintegrate would interpret it as if your branch undid those commits and remove the changes from trunk. It basically made trunk look exactly the same as the branch did at that moment.
I know what you mean, when git was gaining popularity I do remember people saying that branching being better/easier in git. I do think it was overstated back then and just not true. I haven’t used svn in years as well, but I think it was mostly due to the hype than any objective merit.
You don't need to lock tree to deal with merges, you can edit your own commits (e.g. scrub someone due to GDPR from history), works offline, relatively fast.
Git terminology is a clinical example where many (most? definitely not all) terms make perfect sense once you already understand how it works, but make almost no sense in concert with other terminology or when you don't know the implementation details.
In part, but also, it's because different people worked on different parts of git and came up with different names. Linus originally called it the cache, the most computer sciency term, and then I think Junio renamed it to index, a more DVCS-specific term, but most users called it the staging area, and now the evolution of this term is fossilised into the git UI as well as its internals.
Nice. But technically its true. origin/master is not master on origin. It just refers to the last known commit pointed to by origin/master, which gets updated when you fetch (pull automatically fetches).
ls -l .git/refs/remotes/origin/master
origin/master is just a file on your system, you can see when it has been changed. It doesn't magically get updated. Do `git fetch origin` and if there are any changes, you'll see the timestamp change, and the contents:
cat .git/refs/remotes/origin/master
The basics of git are so simple, you can implement the core data structures and some operations in a day. It is really worth it to get to know these.
Somehow git has managed to create a very complex user interface on top of quite a simple core.
Consider a bank telling a store I am buying things at "This person has has €450 in their bank account", when at that moment I have €310. The store would be rightfully pissed at the bank for effectively lying when it is made clear later on that the transaction could not be completed and the bank answers "well, the person had €450 a few days prior to you asking us".
Without an explicit temporal information it is explicitly now.
Without an explicit status on sync status it is implicitly saying sync is up-to-date.
origin/master is not saying that the remote has/hasn't changed. It's comparing your local copy of origin/master, not giving you the status about if remote has/hasn't changed. You need to explicitly ask if remote/origin/master has changed or not if you want to know.
Which in your analogy would be like if the store forgot to actually ask the bank if the customer had the money or not, and instead relying on whatever information they have "cached" in the store. Instead, the store has to first ask the bank (remote) if there is any changes.
I do agree that it could be worded better to actually help the user understand, as it seems to be a common misconception.
Sidenote: I'd be driven to absolute insanity if `git status` started doing remote requests to check the remote origin/master status each time I invoked it.
if you have an old bank receipt that says you have $450 in your account, but you actually have $310, you need to "get" a new receipt that has the newest value.
you do that by issuing git fetch origin. then you can git merge origin/master to make everything up-to-date.
what you have is a "paper receipt" (your checked out version) from your bank. something that, if you need an up-to-date version (from another remote), you need to request a new one (by issuing git fetch).
git is, by default, distributed, so whenever you need to see the world outside, you need to be explicit. linus made it this way because back in the day (not sure right now tbf) tons of kernel developers do work without any internet connection, and would only connect to pull/send patches.
this talk[0] by linus from 2007 (i remember watching it on google videos lol) explains really well where the git mentality came from. i really recommend it to you, since it feels like you are not really getting how git works.
If you have a paper recipt (git status) it tells you when it was up to date, so you can determine whether you need a new one. Git doesn't provide that info. That's the problem, not that it can be outdated, but that it omits the date/time!
That is a lie though. Who knows if the remote has been updated? You wouldn't find out about that 4 hours until you did a fetch.
You can't go to the hotel desk clerk and ask if you have any messages. Then for the next four hours keep telling people "the front desk has no messages for me" despite you not asking them in the last 4 hours. Things could change!
No, this is exactly like a receipt from the ATM. “Your bank balance is $300.00 as of 10/10/23 10:10.” That was weeks ago, so I know to ignore it. The wording can likely be improved. Maybe “You are up to date with origin/master as last fetched 4 hours ago”.
But you haven't talked to your bank, used an ATM, or been on the app in weeks! Your balance could be totally different - bills have came out, you got paid, interest, etc.
You are making my point! You know to ignore it because its old, outdated information.
Then why are you telling me this? How is it useful to me?
Of course youre up to date with what you last fetched - that is _always_ the case.
Why mention being up to date even? Just tell the user when they last synced with their remote(s).
> Of course youre up to date with what you last fetched - that is _always_ the case.
This might be where the misunderstanding is. You are not always up to date with what you last fetched. Say you have develop checked out, and you run a git pull. As part of that process, git checks the status of all upstream branches, and updates your local reference copy of them (that’s what origin/develop, origin/production, origin/feature-branch-1 are: your local reference copies of upstream). Then you check out production, which you last touched two weeks ago. Git will let you know that your two-week-old local copy is behind origin/production, which is your local reference copy of what it just saw when it fetched from upstream.
You're making their point! `git status`tells you the status as of whenever you last fetched, and omits that timestamp. You can't tell if it's outdated, because it doesn't tell you when the last update was!
> Of course youre up to date with what you last fetched - that is _always_ the case.
But that is not what this message is about. It's confusingly worded, as many people agree, but what it says is that your local ref "main" points to the same commit as your local ref "origin/main." It says nothing about "main" on the other computer/server.
And it is not the case (i.e. you are not up to date with origin/main), for example, when you have committed to main but haven't pushed. It is also not the case when you have fetched but not merged.
I mean, we use whatever the boss tells us to, because that's how a job works?
git has a better experience than cvs or svn if you're far away from the VCS server, but that was solvable by having dev machines near the VCS server. I've gotten used to the git workflow, but it still doesn't strike me as uniformly better, other than if you're using git, you don't have to deal with everybody always ask why aren't you using git.
Everyone uses git as a centralized vcs. You could remove the distributed part and 99% of people wouldn't notice. The killer feature was branches, which are orthogonal
Case in point. I take it you don't remember (or don't know) that truly centralised systems like Subversion required a network connection just to make a commit? The commit happened in the repository. There was no local clone. You could check stuff out. That was it.
Yes.
I've never had a job in 20 years across 7 companies where I could code and not be on the corporate network.
I am on a person device, remoting into a corporate network desktop, from which I am then ssh'ing into a linux box.
Walking around with code on a local machine is practically a fireable offense.
For the vast majority of companies signing up for Github/Gitlab/whatever licenses, the remote/decentralized part of git is pointless.
So the decentralized aspects of git just add a layer of complexity/indirection for a lot of use cases. Many extra "git pull"s in my workday.
Yeah tbh I don't understand why you corporate guys use it either. I had a corporate job once. Hated it. They used git but, yeah, it could have been anything centralised. But for my purposes and I guess most people using git it's really important that it's decentralised.
Yeah, I've used subversion a bit. The DAG aspect and commits do not require a distributed system. I'm talking about the idea that git users would pull commits directly from other users and "build consensus" across the network, rather than push and pull from a central repo
> It just refers to the last known commit pointed to by origin/master
The confusion lies in that origin refers to different things depending on if it's `origin master` or `origin/master`. Eg `git pull origin master` does the thing we expect
It's sort of misleading because origin is not the origin, it's your copy of the origin. You have to fetch to make it the same. I can't suggest a better name, but you have to admit the word 'origin' suggests the origin, not some partial copy of it.
It's not a biggy, it's just one of the little toe-stubs and paper cuts you get over pretty quickly.
That's just saying git doesn't connect to the internet (or whereever the remote is) to check for updates without you explicitly telling it to. I think that's a desirable property, though the message could be clearer.
That sounds like the origin/master branch last got an update on XXXXXXXXXX, not that you checked out your local copy on that. I would fetch and then be wondering why nothing charged.
Further it does nothing - it tells you nothing about the remote. It could be 1 second or 100 years and you would still need to fetch to determine if anything is different. Then what is the information for?
What I do is I delete the local master branch. I typically can't push to master, or I don't want to work on master directly anyway, so why have a local mutable master branch? So I only need to update origin/master, which I can do with fetch.
How do you make a branch based off master though? I never work on master, but my workflow is `gs master && git pull && git switch -c my-branch-name` so that my branch at least starts off with the latest master, willing to learn shorter/quicker variant of this though
I've been using Git pretty much since it came out, and I just learned about "porcelain" this week. I have a project that involves parsing the output of `git status`, and adding the `--porcelain` flag is really helpful. It generates a more concise output that's easier to parse programmatically.
I wondered how many commands have a more machine-readable output, and that led me to the git-scm page "Git Internals - Plumbing and Porcelain". In summary, Git was originally written as a toolkit for dealing with version control, rather than a polished version control system. Many of us who used Git from the early days learned to do VCS work with these lower-level commands, and we've passed those workflows on to many other people as well. This is the "plumbing" layer. Git later developed a more polished layer, referred to as the "porcelain".
I'm not entirely clear yet on which commands are part of which layer, but this helped me make sense of the newer workflows I've seen recommended in recent years. It also gives me a better way of reasoning about possible changes to my own workflows.
A clear sign that Linus was happy with the internals (plumbing) but thought the UI (porcelain) needed work. Originally, the porcelain was all scripts calling the plumbing. Sadly, they hardcoded that UI to make it faster, without improving it.
I so appreciate Julia’s authorial voice. She does such a great job writing content that’s super valuable for even veteran devs, while maintaining a tone that’s friendly to the newest people in the field and actively includes them rather than gatekeeping.
I think I have a case of "Git Stockholm Syndrome"; I don't find any of these terms terribly confusing, but I think that's in no small part because I haven't really learned any other systems, and have been entrenched in Git since like 2011.
Looking back, I suspect that I was extremely confused when starting out, but was pretending I wasn't to try and seem cool.
Fossil looks neat, but it didn't look "sufficiently better" than Git for me to bother changing, especially due to all the services that support Git out of the box.
Git (not github) was built for bazaar style decentralized development where people might contribute a single patch or two, and it scales really well (see Linux). People don’t even need an account to contribute, they can just send a patch via email.
Fossil was built for cathedral style development, where you’re a small team of trusted contributors. You get offline first issue tracker, wiki, forum, chat etc. out of the box and integrated into an easy to backup solution. Allegedly it doesn’t scale as well though.
With hosted services like Github, I feel like fossil doesn’t buy much more than offline first capability and simpler to use. However if you self host, fossil is dead simple run (single binary) and to backup due to it generating a single SQLite file and it’s easy to stream changes elsewhere. Setting up git with gitolite, email list, issue tracker etc is much harder (though I’ve heard gitea is easy to use).
In terms of Fossil as a technology, it's an SCM with built-in project management tools (wiki, forums, bug tracker, etc) so it does much more than git does.
I don't really agree with this assessment of rebase. I think there is value in squashing commits on dev branches into logically distinct sets to keep the log cleaner. In the case where the reason code is doing something that is not obvious I think that explaining the reasoning or the journey to that point in a commit message is clearer than the complete history of trial and error to reach that state.
Someone told me that I really need to check out Mercurial for its binary diffing stuff, and as I've gotten more into 3d modeling in the last year then that might actually buy me something.
However, now that bitbucket has dropped Mercurial support, I'm not entirely sure where I can easily push a mercurial repo for backup. For better or worse, I am extremely dependent on Gitlab to backup my code so I'm not risking my work on a potentially failing hard disk/ssd.
I don't know. For large binary files I still use Google Drive as backup (I know 3D models are not necessarily "large" by today's standard)
One can use git LFS, but there isn't an easy way to free up the storage them occupy from the history. And GitHub LFS is about 5 times more expensive than Google Drive per GB.
These are just CAD-style models for functional robotic parts, not game assets or anything, so they're not actually very large as they're pretty utilitarian, but they do change a lot. As of right now, I'm just pushing the binary files to Gitlab with vanilla git, and at least thus far Gitlab hasn't complained to me.
I figure that the moment Gitlab sends me a nastygram about it, I'll move to S3 or Google Storage or something.
I saw this, but the stuff I'm working on isn't (for the moment) open source, which most of them require. That said, I will give the Perforce one a look.
> ... But it’s actually a little misleading. You might think that this means that your main branch is up to date. It doesn’t.
No need to well-actually this one. Isn’t the Two Generals problem applicable here? If you are being really pedantic, it is impossible to tell whether you are “up to date” right this split second. Even if you do a fetch before `status`. So what’s the reasonable expectation? That the ref in your object database—on your own computer—is up-to-date with some other way-over-there ref in an object database last time you checked.
A simple `status` invocation can’t (1) do a network fetch (annoying) and (2) remind you about the fundamentals of the tool that you are working with. In my opinion. But amazingly there are a ton of [votes on] comments on a StackOverflow answer[1] that suggest that those two are exactly what is needed.
> I think git could theoretically give you a more accurate message like “is up to date with the origin’s main as of your last fetch 5 days ago”
But this is more reasonable since it just reminds you how long ago it was that you fetched.
Another thing: I thought that `ORIG_HEAD` was related to `FETCH_HEAD`, i.e. something to do with “head of origin”. But no. That “pseudoref” has something to do with being a save-point before you do a more involved rewrite like a rebase. Which was implemented before we got the reflog. I guess it means “original head”?
I'm pretty pro-Git, but I agree with the author on the wording of that message (though I disagree with their suggested alternative). "Your branch is up to date with ‘origin/main’" is technically correct, but the usage of the phrase "up to date" implies on a casual reading two things: That main matches origin/main, and that origin/main is up to date (i.e. that it was updated as part of the command, or is being kept up to date automatically and the last successful sync was arbitrarily recently). We're talking about a user-facing status message, not a machine-readable signal that must be true during this CPU cycle. This is a reasonable interpretation without having to get into networking theory.
"Up to date" means caught up in time, as opposed to space. Local branch positions are more like space ("where is this branch pointing?") and remote ref state is more like time ("when did I last update the remote ref?"). I know, that's very subjective. Either can mean either.
Anyway, I think a better wording might be "Your branch matches origin/main" or "Your branch's head is the same as origin/main" or "Your branch is pointing to the same commit as origin/main" or some other tradeoff between verbosity and clarity. Maybe with the author's suggestion as a parenthetical: "Your branch ... origin/main (remote ref last updated 5 days ago)."
>A simple `status` invocation can’t (1) do a network fetch (annoying)
i don't think that's annoying. i want network operations to be explicit, not implicit. when i do git status, i wanna know the status of my repository as is in my file system.
if i want to know what's going on in another remote, i will fetch that and then compare.
The expectation is that the information you get is up-to-date as of the time you started the status command, yes there can be a race in the time it takes to present information to your terminal but that's a small time window. This more or less means you want to fetch, yes. Two Generals problem is only applicable on the remote side, which may keep sending retransmits for a while, if it doesn't get acks that you received its data (which isn't the client's problem). If the client doesn't get data from the server presumably the right behavior (which I'd expect happens now with fetch) is to hang and print an error after a timeout.
It is not - it is the expectation of the phrase "up to date". You don't have to alter reality, you can just alter the description of reality to be a little more conversationally precise.
It's pretty presumptuous to just tell somebody what they think about English is wrong. We can change the wording without changing the system (Though technically we could even alter the system to update the remote ref as part of the status command, and the wording would be much better to a human! Though we shouldn't, since, as you imply, we don't want the status command to be dependent on a network call.)
Edit: Removed accusation of "Conflating the implementation of the system with the wording of its output".
Not to pile on but also the idiots at Bitbucket cooking the "pull request" term.
A "pull" is the action of merging remote changes into a local repo. What the user is actually requesting is for the server to merge her/his remote changes into a branch. Gitlab calls it a "merge request" which is right. I saw someone doing a fetch and their commits disappeared! They where hidden because the repo went backwards in time. Git hides all those tidbits of information in the status but if you don't use git-prompt or powerline-shell ...you are working in the dark.
The original idea, as enshrined in git-request-pull(1), was that we would all have our own git repos somewhere, kernel.org/git or redhat.com/git, and then we would request to pull from each other by sending emails to each other to pull from each other repos across different servers, even different organisations and different domains.
Github took inspiration from git's request-pull command but instead reinterpreted it as merging from one github repo to another.
I thought it originated in a decentralized git use case where developer A asks developer B to pull their proposed new code into their (local) repo. So predating server hosted git. Perhaps I made that up though.
That's generally my point: despite the server presenting itself as a centralized "hub", it is really just another decentralized user of a decentralized git repo.
From a git perspective, there is no meaningful difference between server and client.
i prefer to talk about a project's "canonical repository". so for example, the main project i work on has two primary developers, each of us doing a bunch of mostly independent work. we could set things up so that we simply push/pull between those two repositories.
but that's a much less useful way to work than to denote a repository somewhere as the "canonical" one, and then all those involved pull from it and push to it.
the presence/absence of a server associated with a repo is a bit of a red herring - what matters is the workflow associated with the different repos. my own repo is just a local private one of no particular significance, as is that of my colleague. by contrast, git.ardour.org is canonical (and, as an aside, github.com/Ardour/ardour is merely a mirror).
I like the way L. Torvalds explained here https://www.youtube.com/watch?v=4XpnKHJAok8 where he refers to the circle of trust. A group of developers that each share a copy of the project. Very democratic. Within the circle of trust, of course.
Except that there is no such canonical repository. I can set your laptop as a remote and pull from you. You can set my raspberry pi as a remote and pull from me. There is no such thing as a “canonical” repository in git. Such a concept only exists in the systems built on top of git, if it exists at all.
No, no, you misunderstand. The canonical repository is just an agreement among developers. Nothing more (other than a bit of infrastructure to enable it).
The Git book [0] explains where this term comes from and how it makes sense.
In short, it's not a request to a server but to another person. You're requesting that they pull your branch to take a look at the changes you want to contribute to the project.
Github popularized “pull request” and I think it’s a fine term. Whether you are “actually” pulling from a different repository instead of just doing a “merge request” (idiosyncratic GitLab term) within the same repository doesn’t feel like an interesting distinction.
If you're treating git as a centralized VCS, that is, there is only a singular upstream, perhaps, say, GitHub.com, then that makes sense. That, however, is not the only way to use the tool (though GitHub.com obviously has their own opinions on whether that should be the case or not), but the upstream repository that you're pulling from certainly it's an important distinction if you're using the tool beyond how GitHub.com wants you to.
So you should change your terminology depending on some “how X” your workflow is? If you are working with two repositories between yourself and a teammate then it becomes “pull request”, but then if you move back to the centralized company upstream then you’re doing “merge requests”? The distinction is not interesting enough to, well, make a distinction over.
> That, however, is not the only way to use the tool
And “pull request” somehow is exclusionary? No, because you can use it to talk about both inter- and intra-repository changes.
Yeah, `git pull` is just shorthand for `git fetch` followed by `git merge`, so it's technically a superset of a "merge request".
And it also handles the cross-repo case, which is a common case in the Github model of "make your own personal fork of the upstream repo and send PRs from there," which has advantages -- it allows random people to send PRs without needing to give them permission to e.g. pollute the upstream repo's branch namespace.
You pull when you want code from their repo, they pull when they want code from your repo. You don't have permission to push to their repo, so instead you request that they pull from yours.
No you pushed it to your own GitHub repository. So they need to pull from your repository into theirs.
Though I agree the situation is somewhat muddied by the fact that you can create pull requests for branches in the same repository (even though that's not the normal workflow). GitLab's "merge request" terminology is more accurate for that use case.
So... technically no. It really is a "pull" request from Github's perspective. A PR is (1) a bunch of review/comment/history/hook tracking in the web/cloud backends and (2) a single git branch name somewhere else on github, writable by the author, which contains the changes.
The underlying action of merging a "pull request" is to pull from the submitted branch to the target branch. It's no different than Linus doing it from a maintainer branch.
I'm watching Jujutsu / https://github.com/martinvonz/jj as a git-compatible wrapper/porcelain that adds some important features and fix a bunch of what is confusing about git:
It explicitly models conflicts, so you can e.g. send your conflicts around and aren't forced to deal with them immediately.
It tracks and logs all repo state including repo operations (checkout, rebase, merge, etc) and those state changes can be manipulated and reverted individually just like commits.
The working copy is automatically committed, and operations typically amend that latest commit. This alone like halves the number of unintuitive concepts new users need to learn. Combined with robust history rewriting tools this makes for a much better workflow.
I've been using it for a month or two on top of my work git repos. It's great, but I have a few complaints. It doesn't support creating git tags, it doesn't support git submodules, and if you ever drop a big file in your working directory that isn't already git ignored, it will be absorbed into the "index repo." Despite that stuff, it's still a better experience than git.
The way in which the author structured the language used to describe things may be the most confusing thing of all. For example:
"HEAD^ and HEAD~ are the same thing (1 commit ago)"
Followed by:
"But I guess they also wanted a way to refer to “3 commits ago”, so HEAD^3 is the third parent of the current commit, and HEAD~3 is the parent’s parent’s parent."
The author's language implies a contradiction they immediately prior said doesn't exist. If these two distinct constructs were indeed different ways to define the same relationship, the second paragraph would say "^3 and ~3 are both ways of saying the third parent of the current commit, or the parent's parent's parent." Instead, they've defined the constructs as different once again.
> If these two distinct constructs were indeed different ways to define the same relationship, the second paragraph would say "^3 and ~3 are both ways of saying the third parent of the current commit, or the parent's parent's parent."
They're not the same thing. Merge commits can have multiple parents, and HEAD^3 refers to the third parent. If HEAD is not a merge commit, then HEAD^3 doesn't refer to anything. HEAD~3, by contrast, refers to following the parent's parent's parent, independent of how many parents any of the commits in question had.
> HEAD~3, by contrast, refers to following the parent's parent's parent,
More specifically, following the first parent only. man git-rev-parse (surprisingly) has a nice explanation:
> A suffix ~<n> to a revision parameter means the commit object that is the <n>th generation ancestor of the named commit object, following only the first parents. I.e. <rev>~3 is equivalent to <rev>^^^ which is equivalent to <rev>^1^1^1.
The "Missing Semester" is a collection of video lectures from MIT that covers the fundamentals of command line, git, vim, and other tools often overlooked in computer science courses. As a self-taught developer, I found it to be an ideal introduction to git.
The series made me also pick up vim, and I have not looked back since.
My constant git commandline annoyance is that some commands take remote branches as `origin mybranch` and some take `origin/mybranch` .. maybe there is an arcane reason for this, but I've never seen it.
Because they are referring to different things. `mybranch` is a reference in your local repository. `origin/mybranch` is a reference on a remote repository that you call `origin`, `origin/mybranch` is just a representation of this remote reference for convenience.
So, if we take the example from derefr [1], `git chekcout foo` lets you go to your own local branch `foo`. Then, `git reset --hard origin/foo` modifies the current local ref (`foo`) to be the same as `origin/foo`, and change the working directory accordingly.
Does 'origin/mybranch' is referring to the local copy of the remote branch? Why can I run 'git checkout origin/mybranch'(which results in a detached head) but not 'git switch origin/mybranch'?
Also git reset appears to be taking a commit as the final parameter,
git reset [--soft | --mixed [-N] | --hard | --merge | --keep] [-q] [<commit>]
That tends to happen a lot with git and other fads in the tech industry. See the discussion here where people are defending how git has the misleading message of “ Your branch is up to date with origin/master.” A lot of rationalizations working backwards.
> some commands take remote branches as `origin mybranch`
That isn't a single argument, that's two arguments. If you look at docs for some commands which accept `origin mybranch`, git fetch for example, it says the first argument is `<repository>` which can either be a URL or a remote name. In your example it's a remote called `origin`.
The 2nd argument is then a `<refspec>` which is a bit complex - it specifies what to fetch and where to fetch it. So in your example `mybranch` is shorthand for `mybranch:mybranch` (i.e. fetch `mybranch` from `origin` and update local `mybranch`). You can even do `git fetch origin mybranch:mybranch-local` which would fetch `mybranch` from `origin` and update `mybranch-local` with it.
This one gets me constantly even after using Git for nearly a decade. My workflow where I try it one way and then the other is uncannily similar to how most people have to flip a USB at least once to get it plugged in. No idea why it's like this.
I think origin/mybranch is a ref to a commit that was the current one when you last did a git fetch. `origin mybranch` fetches the latest commit from the origin.
Coming up on 10 years ago, I got completely fed up with git's bullshit. I decided what commands I needed to get my job done, and then implemented them as a set of aliases in my .gitconfig.
It currently has 72 aliases and another 35 lines of defaults. My pride and joy is "git extract" which is a 700 character shell script that uncommit a single file from the latest commit while preserving the staged and uncommitted changes.
No one else in the world speaks my bizarre git dialog, but with my .gitconfig at my side, I feel like a git wizard. Take this file away from me, and I can barely commit a change. I have no regrets.
I am very curious about your "git extract" script, what does it do? I was thinking a bit about it, and this is what I came up with to "uncommit" a specific file by reverting it to the state it was in 2 commits ago and then amending the last commit:
It creates a temp branch and then pushes two commits onto it, one for the currently staged changes and one for the currently unstaged changes. It then checks out the original branch and reverts the file by checking out the previous version and then merging it into the top commit. Next, it checks out from the temp branch the changes to the file and also cherry-picks the staged and unstaged commits. Finally, it calls reset a bunch to restore the staged and unstaged state. I'm sure there is some edge case where it's completely broken, but it works very reliably for me.
The reason I wrote it is that I have a workflow where I build up a single commit gradually by adding changes once they are "done" and ready for the final code review. I think most people instead create a series of commits, rather than one, and then squash-merge them all at the end.
What I like about my approach is that at any time I have: committed changes that are complete, staged changes that still need a little cleanup (e.g. documentation or tests), and then unstaged changes that are what I'm working on right now. I may also edit the commit message as I go along.
It's not common, but I use the extract command when I've committed something but decide I want to revert it in part or in whole because I found a different way. Again, having all my committed changes in the top commit helps.
The GitHub Desktop app changes some terminology to make it more usable, for example "Undo" instead of "reset HEAD~". I typically make aliases for this sort of thing in my terminal, but it would be great if some of those made it to the Git CLI.
> Imagine that for some reason I just want to move commits F and G to be rebased on top of main. I think there’s probably some git workflow where this comes up a lot.
This comes up for me in a branch where I have some necessarly local changes which are permanently there and have to be maintained, and that branch experiences non-fast-forward changes from upstream.
Say we are up-to-date in this branch, plus our two local commits that are not in upstream.
We do a fetch. Upstream has rewritten 17 commits. So now we have 19 diverging local commits. We only care about two of them. We just want to accept the diverged upstream commits, and then rebase the two on top of that.
We do not want to rebase all 19 commits!
We can do:
git rebase HEAD^^ --onto origin/master
So HEAD^^ is the "upstream" here that we have explicitly specified. So following the documentation:
- All changes made by commits in the current branch but that are not in <upstream> are saved to a temporary area.
[That's precisely our two local commits; those are the ones not in HEAD^^]
- The current branch is reset to <upstream>, or <newbase> if the --onto option was supplied.
[We did specify --onto, so the current branch goes weeee... to origin/master, the abruptly updated, non-fast-forward upstream that we want to catch up with.]
- The commits that were previously saved into the temporary area are then reapplied to the current branch, one by one, in order.
[That's the cherry-picking that we want: our two local commits.]
So end result is that our branch is now the same as origin/master (up-to-date) plus has the two needed local commits.
Further, I have a situation in which there are two git repos on the same machine which have a different version of a local commit. One of the repos (A) is an upstream for the other (B).
When B does a fetch, it gets the master branch from A with A's local commit, which is unwanted in B. B has its own flavor of that commit:
git rebase HEAD^ --onto origin/master^
takes care of it. We catch up with origin/master, but ignoring one commit, and on top of that we cherry pick ours.
I love Git, but with the huge caveat that it is relative love - relative to the universe of garbage software trying to perform complex tasks. I think it has a few glaring issues (the reset command and the overloaded pages of the official docs come to mind), but I also think 80% of the criticism aimed at the design or CLI of Git is undeserved or misapplied. E.g., here's my quick critique of some of the points under the "alien mental model" part:
> A commit is its entire worldline
> Commit content is both a snapshot and a patch
> Branches aren't quite branches, they're more like little bookmark go-karts
These are three versions of the same fallacy - technically correct, but only in the same sense that a text file is "both text and ones-and-zeroes", or "not text, but actually ones-and-zeroes". The author is pointing at different abstraction layers, some of which aren't even necessary parts of the user mental model. If you don't yet understand concepts like "abstraction layers" or "implementation details" (e.g. the difference between "a branch is a series of commits" and "a branch is represented by a pointer to a commit, which, by the definition of a tree, resolves deterministically to a series of commits"), then you will have this problem with any software that gives you power to work on a complex problem like version control.
> Merge conflicts are actually just difficult
If your version control system makes merge conflicts easy/rare in the absolute sense, then it is doing dangerous/cute "idiot-proofing" that will bite you at some point.
In summary, I think a good chunk of complaints about Git are actually just complaints about version control (i.e. Git is hard because version control is hard), or unnecessary combinations of different abstraction layers (which, to be fair, most software/documentation hides better from the user, but IMHO that is a bad thing). Git is an amazing piece of software (relatively speaking).
> I don’t really know what this means, I’ve always just used
> whatever the default is when you do a git clone or git
remote
> add, and I’ve never felt any motivation to learn about it or
> change it from the default.
This is a refspec[0], and it tells Git the relationship between local references and remote references, defined in the pattern [+]<src>:<dest>. So, in this case, it defines the local head of main to refer to the remote branch main on origin. You could, for example, define all branches on local as linked to all branches on the remote with +refs/heads/*:refs/remotes/origin/*.
An example of a real world usage of this was when we were migrating our repo between providers -- configuring the fetch field to link a different remote for different branch names kept things simple for our developers as we quietly moved branches to the new remote.
It also configures the scope of a git fetch command, iirc, so you can restrict your fetches to only scopes that you care about (maybe your team has a branch name prefix/namespace that you can specify, so when you git fetch you only get things that are relevant and not some other teams' branches you don't need).
I'm usually the "git guy" so having something like this to refer to when people ask me about such things will be really useful. Thanks!
One of the problems I have is figuring out how people get themselves into such situations. Like if someone submits a merge request and their branch looks like it's been merged into itself with one side rebased or something. When I ask them what they did they have invariably forgotten. If someone cooked a dish that was too salty I'd be able to tell them "try putting less salt in less time". I wish I could do the same with git.
Although upon typing this I'm thinking perhaps I could figure it out by looking at their reflog? But that involves accessing their computer or walking them through it which would probably confuse them even more.
"under the hood git rebase main is merging the current branch into main (it’s like git checkout main; git merge current_branch"
This is wrong. 'git rebase main' doesn't affect main at all! If you are on main and check out a new branch, then add commit-one, then switch back to main and add commit-two, then switch back to the new branch and 'git rebase main', your new branch will have commit-one and commit-two, and main will still only have commit-two.
it's much more like simply 'git merge main' with some extra magic to avoid a separate merge commit.
Git’s arcane command syntax is a thing of legend and after years of using it I still find myself being mystified by so many things once I need to delve beyond the usual.
Sorry if it’s already been mentioned but this fake git man page generator gives me a big stupid grin every time I use it:
https://git-man-page-generator.lokaltog.net/ (I’m sure it’s been linked on HN before)
The documentation it generates just so plausible with the kind of verbs and nouns it uses, and how the help text is always descriptive, but never really helpful.
GIT is a technical marvel, and due to its immense usefulness many people have mastered it and have become more productive because of it.
But that doesn’t mean it’s not confusing, both over- and under-documented and unpredictable.
To me it’s a bit like seeing someone work with highly specialized systems (think: CAD or a particle collider): who am I to say it “doesn’t work”, these people are designing cars and discovering bosons. But still I think: if someone could design these things again from scratch, they could likely be even better.
When learning Git, you have to forget the English definition of every word you use, then there's some pain, then you are born anew, and you can do in 3 keywords what it would take 14 words, 8 brackets, and some combination of dollar sign, underscore and dot before some symbols if it was designed like Powershell, or that would be impossible to do if it was designed like cmd.exe.
I honestly never wrapped my head around the `git branch` and `git switch` commands and continue to use checkout for everything branch-related because it's what I've been doing since 2007. I haven't given them a fair shake, granted, but to me it's kind of like complaining that Unix's mv is confusing because it also renames files.
`git restore` though is slightly nicer than using checkout. I've incorporated its use into my workflow, even though I've previously used checkout for the same operation.
I've also been using git since around that time but had no problem _switching_ in 2019 because I'd also spent many years using svn. Choice is good, switch just fits my brain better.
That’s not quite right is it? HEAD is simply a pointer to whatever commit is currently on disk.
HEAD can point to a commit that is the end of a branch (typical state, HEAD is “attached”), but it can also point to one that isn’t (HEAD is “detached”).
This was always super confusing to me. I'm happy since using fork.dev that instead of "ours"/"theirs" it just names the branches. So much clearer. I know that's a GUI and not command line but no other GUIs I've seen have done so. 0
I find this one annoying because I generally don't want to run `git pull` -- I almost never `git pull`, I usually just `git fetch` and update my branches as necessary. I do wish there was a built-in shortcut for "try to fast-forward this branch"; I often just do a rebase or a merge, which will do the right thing for a fast-forward, but won't fail if a fast-forward is impossible. I can do `git merge --ff-only`, but I would like it if `git fastforward` or `git ff` was available instead because for me it is such a common operation.
To forestall the obvious -- I don't like to make custom aliases or commands (though I've done it in the past) because it makes it harder to migrate between environments.
That's even worse because it will just do the wrong thing when I'm on a different setup. At least with an alias the system can say "yeah, I ain't never heard of 'ff'" so I can fall back on a default.
Very useful to make a git alias for that. When git bitches at my about the unknown command on the work servers I know I can just use the full merge command.
I’m not convinced this matters. It’s very important that some things be kept easy and accessible even if you don’t do them very often. I don’t call emergency services very often, but when I do, it had better be trivially easy. I don’t spin up a new server very often, but when I do, I need to be able to function there without immediately installing a bunch of config. I don’t help colleagues who need git assistance very often, but when I do, the last thing I need is to mess something up for them because all my aliases and defaults are missing.
Dear AI, it would be nice to just use an English prompt, like: 'put the work I just did onto the Develop branch, even though I forgot to make a separate branch for it first'. Somebody must be making a Git-AI, right?
commit -a
Tell the command to automatically stage files that have been modified and deleted, but new files you have not told Git about are not affected.
add -A
Update the index[...] This adds, modifies, and removes index entries to match the working tree.
So commit -a won't track new files, but add -A will.
I guess I've never really found the "ours" and "theirs" terminology too confusing -- it's called "ours" because that refers to the branch I have checked out right now (i.e. that's "my" branch), and "theirs" because that's the other branch, the foreign one which isn't currently "mine".
But maybe that's because I almost never use rebase, where it apparently is switched?
That's exactly why the author included it. Your definition seems appropriate at first and is probably what most people assume: "ours" refers to "my current branch" (HEAD), and "theirs" refers to "the other branch".
This breaks down during rebase, where the terminology gets reversed. The definitions are more accurately:
- "Ours" refers to "the branch whose HEAD will be the 1st parent of the merge commit" (during merge) or "the branch who will get commits applied on top of its HEAD" (during rebase)
- "Theirs" refers to "the branch whose HEAD will be the 2nd parent of the merge commit" (during merge) or "the branch whose commits will be applied on top of the other branch" (during rebase)
This gets a little more complicated during "octopus merges", where there are multiple "theirs" branches.
I'm constantly having to double-check directionality for pretty much everything in Git. I find none of the terminology or ordering intuitive, and never trust myself to have remembered it correctly. Ditto whether various commands that take or imply ranges are inclusive or exclusive.
Ah, but it's not switched! Remember that rebase essentially replays commits against a different branch/base commit:
e.g. if you have your branch `feature-1` and want to rebase it on `main`, then you would do `git checkout feature-1; git rebase -i main`
Git will then switch to main (that's "ours" now) and then it will replay the changes from `feature-1` on top of it (that's "theirs" now) - like cherry-picking all your commits, but in sequence (and not actually merging them in `main`)
> Remember that rebase essentially replays commits against a different branch/base commit
This is exactly the reason that git is hard. Its abstractions are so leaky that you'd think it's interface is designed to be a sieve. I really like the underlying model git uses, but the actual CLI does a terrible job of providing mechanisms to use it to the point where you have to pay far too much attention to the internal model to be able to avoid footguns. It's an indictment of a poor API when the best way to figure out how to do something isn't to search the docs but to figure out how to express the thing you want to do as an operation on the underlying model and then Google that to find the invocation that happens to map to that operation (and half the time, it's not even it's own subcommand; it's just some obscure flag to a grossly overloaded subcommand like `checkout`).
I can't quite find it, but there's a video on git internals, and how you can make commits without using the git cli, by directly manipulating .git folder. That really helped me deal with certain idiosyncracies of the git cli (either I guess a hacked-together sequence of git cli commands, or i know what to nuke)
Git has proven to be impervious to designing a discoverable visual UI. All the ones I have used are fine if you understand git. But put a naive user in front of a git GUI and they are almost as confused, just differently. It would be wild if git turned out to be something that just can't be made visual. But I suspect the task of making git n00b friendly was never taken seriously.
TIL (again) that the newer and supposedly nicer way to “take all the files in PATH back to how they were at COMMIT, without changing any other files or commit history” is
git restore --source=COMMIT PATH
I have learned and tried this before but still have the muscle memory of doing
There is a slight difference in behavior though. For restore you won't get you index (staging area) changed. Pass "--no-overlay" to checkout to get the same behavior as restore.
As with all these posts, use the git lola alias regularly and all becomes clear as you can see where all the branches locally and remote are and how they change when you do things.
Not terminology but still what kick me most is the 100MB. Should not commit I know, but sometimes miss it and if you miss it and continue commit …. Try all those Java solution but at the end of day has to find all changes, copy somewhere else and hard reset or even reclone. No easy way out.
Although, the author seems a bit scared of reflog. I find it very helpful in a lot of situations, because it gives you the history of the state of your navigations around the commit tree - not the history of the tree itself.
Git is so integral to software development, I find it hard to imagine anything else. But I firmly believe that there must be a better way. Whatever people used before git (perforce?) was working fine, let's use that.
There have been several solutions, git is just the latest. The switch decision has several factors, it's not a quick or easy pick. Do you need tighter file security? Does your team generate/store files larger than N gigabytes? Are you writing integrations? Are you debating microservices vs monolith? And a big one, do you need official support?
(Disclaimer, I work at Perforce. Happy to answer any questions!).
My biggest gripe about Git, currently, is its poor to non-existent support for renaming files. Basically, it doesn't track file renames at all.
True, in its current architecture it has no way to do it since it doesn't have a daemon or any background process which tracks user actions in real-time, all tracking is ad-hoc, only when git is invoked.
So it basically just side-steps the question, and internally a renamed file is recorded as a file which was deleted and another file which was created, "magically" in the same commit.
When you run merge or log, git tries to guess that a file was actually renamed based on similarity statistics between the text representation. With small, similar files, changed this leads to false conflicts.
I'm not talking theoretically here. I'm talking about something that has caused me, personally, and my organization, many developer hours and real money. I'm talking about something that has hurt productivity and confused me and my users. I'm talking about something that has caused me, just this week, to be alerted to go to work since this happened on a critical project and I'm as the "git expert" was deemed the only one with the know-how of how to deal with those conflicts.
I have spent 2 weeks during covid lockdown, 2 years ago, writing a complicated function which silently identifies those false renames and fixes them before calling git merge. This eliminated many, many support calls. Just last week I have fixed what I hope was the last bug with this function, a nasty corner case.
However this is specific to my situation, where the files are json files that were serialized from the database, and each one of them has a globally unique ID that users can't change. So I'm able to verify which files were renamed in which side and automatically make a "fixup" commit on each side so the 2 sides are as equal as possible (leaving real conflicts in place).
The situation isn't good since this function only works on a single branch merged from the remote. It doesn't work yet between branches. In fact I ought to begin working on changing it to work between branches as I am writing this very comment...
I have many, many other gripes about git, but many of them have been voiced on the comments in this page. But I generally mourn the fact that it is the de-facto source control solution in its current form. Since now its maintainers, quite justifiably, won't break backwards compatibility, and are mostly into fixing bugs and adding some small new features that are QOL improvements.
I still think that some of the bigger pain points could be addressed without breaking backwards compatibility. Similar to what was done with splitting the checkout command to switch and restore, while still keeping the original command.
E.g. for the rename problem above:
1) an optional daemon could be introduced that tracks renames in real-time and records them, both for display purposes in the log, and more importantly to avoid false conflicts during merges / rebases.
2) And/or, users could mark during merges that have conflicts, which files were actually renamed to other ones, in which point in history (perhaps by modifying a generated an input file like rebase does), to help git merge/rebase make more informed decisions instead of relying on statistical similarities.
Don't worry, git manpages have a builtin glossary, just type in `git help glossary`, and you can get helpful definitions for confusing terms:
> index
> A collection of files with stat information, whose contents are stored as objects. The index is a stored version of your working tree. Truth be told, it can also contain a second, and even a third version of a working tree, which are used when merging.
Correction: you can get unhelpful definitions of all the confusing terminology.
You have described my experience with man pages, and in general with developer-written documentation. I think part of it is simply that a lot of devs struggle to communicate what they know to people who don't already know what they know. I also suspect lack of interest is a big reason (writing the code is fun, writing the docs is boring).
A lot of the time the language in man pages fine, but it could be easily clarified for first-time readers with simple examples. What is it with that? It seems to be an unwritten rule that Man Pages Shalt Not Provide Examples. Is there a good reason for this is or is it a "secret club"-type thing or something else?
EDIT: I "let me google that for you"'d myself and found this [0] right away which explains it, I guess.
I imagine it as a competition between myself and an adversary with an uncooperative attitude, one who's prepared to act smarter, dumber, better informed and more ignorant than I am in order to to find gaps, inaccuracies, or ambiguities in the docs I write.
It doesn't hurt that it improves my understanding of whatever it is that I'm documenting -- and where it could be improved.
I write docs with my most terrifyingly dangerous adversary in mind: Future Me. His forgetfulness and capacity for ignorant destruction are unparalleled, at least that's what Past Me says.
A lot of documentation is atrocious, particularly for newer stuff. But I kind of disagree on man pages. At least, when I first encountered Unix, it was on old SunOS machines, and I remember being impressed first that everything was documented, and second, that the documentation was comprehensive and useful.
The things that make me most frustrated are when docs are missing, incomplete, or useless ("--foo: enables the foo option"). When it comes to man pages, at least in the good old days, I felt like I could always rely on them to cover all of the possible inputs, outputs, and errors, and to describe them, if not in the plainest language, in a way I could understand without being the author of the program.
There have never been "just man pages" in Unix. E.g. V7 had a number of papers (in troff format which could be rendered to the terminal or sent to the printer in finite time) that helped with understanding how to use the system (assuming you had more than the bare minimum disk space on your machine).
Yes, and I bought some of those at the campus bookstore. What I meant was that if you want to know how to do a recursive grep or something, man was the main resource. There was no google or stack overflow.
The trouble with a lot of technical documentation, particularly man-pages and in-tool command-line help, is that it usually seems written as a reference/reminder for people who know more-or-less what they are doing already. You can often work things out from a more standing start, but that sort of documentation isn't optimised for this so you have to work for your illucidation.
For popular technologies there are often books, tutorials and other introductory articles. Git certainly has enough information to start out without reading every manpage.
I basically never read man pages. In the overwhelming majority of cases you spend 20+ minutes reading the manual and it ends up not answering your question (or at least this is my experience) so you end up just doing a web search anyway.
As this might not be obvious to all, tldr is a crowdsourced man-page replacement, and also really useful when working with git.
$ tldr git cherry-pick
Apply the changes introduced by existing commits to the current branch.
To apply changes to another branch, first use `git checkout` to switch to the desired branch.
More information: <https://git-scm.com/docs/git-cherry-pick>.
- Apply a commit to the current branch:
git cherry-pick commit
- Apply a range of commits to the current branch (see also `git rebase --onto`):
git cherry-pick start_commit~..end_commit
- Apply multiple (non-sequential) commits to the current branch:
git cherry-pick commit_1 commit_2
- Add the changes of a commit to the working directory, without creating a commit:
git cherry-pick -n commit
The single biggest contribution StackOverflow made to documentation as a whole was to flip the script.
Instead of the original developer of the material trying to prognosticate what questions people would have, people asked questions and either other users or the developer could answer them.
The single thing that made everything "click" together is that most things are just pointers to commits: branch names, HEAD, tags, all of them are pointers.
HEAD is pointing to the commit you're currently looking at
The name of each branch (e.g. `my-feature` points to the latest commit of that branch)
When you're on main and you `git checkout -b my-feature` then you have at least 3 pointers to the latest commit on main: `main`, `my-feature` and `HEAD`.
Every time that you make a commit on `my-branch`, then both the `HEAD` and `my-branch` move to point to the new commit.
"detached HEAD" means that the the `HEAD` (the commit you're looking at) is not pointed at by a branch.
The difference between tags and branches is that the tags point to a specific commit and do not move.
----
The other thing caught me out multiple times is that most commands seem inconsistent because git assumes default arguments:
`git checkout file.txt` is the same as `git checkout HEAD -- file.txt`
When you're on `my-branch`, `git rebase main` is the same as `git rebase main my-branch`
The difference is that the latter you can run it from other branches too.
----
Last but not least, when everything goes wrong, the single command that can take you out of any weird situation is `git reflog` which shows you all the commits that HEAD has pointed to.
Having said all that, I'm glad that git is acknowledging all this confusion and are implementing commands with fewer surprises and simpler interface as that will make it easier for newcomers to pick it up.