What's amazing is that you can also use git locally, without a server, using "git daemon".
I've created a small little bash function for that, and then I just pull from machine to machine (or IP to IP) directly without needing to bother with the internet. Git is very useful this way on hackathons or when there's not much internet bandwidth to begin with.
You don’t need a daemon. You can push and pull to any url, including file urls
as long as you have access to the url(e.g. local file permission or network authentication with a samba server, or a shared folder on another pc), you can use it as a remote
Including ssh, of course. `git clone $USER@example.com:/tmp/blah.git`, and `git init --bare` (for a non-checked-out, non-working-dir, "just the .git folder" file location).
Back when GitHub was in infancy, git-web was the top crop UI, our tracker was either one of Trac or Redmine, CI was a Hudson/Jenkins hellscape, the world "cloud" referred to water in the sky, reliable VMs were a distant dream, and servers were not cheap, our team of three~four people set up decentralised git over ssh.
Each developer workstation had a git user with ssh enabled and restricted to some git invocation I can't recall, and chgrp git / chmod g+rwS a conventional path, and remotes named from team members. PRs were literally that: either emails or one shouting to another over our desk that someone could git pull from one's machine straight from their (non-bare) repo.
The whole development process was entirely decentralised, any one's machine was as worthy as the next one and there was no single point of failure.
This reads like a chapter out of my own biography.
This experience started on a self-hosted SVN + Hudson server (not VM). That server was repurposed to run ESX that the hosted the VM of its former self. Which felt a bit pointless.
Then we moved to git from svn but kept everything else the same. Had groovy scripts coming out of our build and deployment ears. shudder
and if you start bragging about the "decentralization"... well, there is a reason anyone now uses a gitHUB, where there is a proper, centralized, version online
> where there is a proper, centralized, version online
Linux has a proper, centralized, version online. Still their collaboration workflow (they use the email workflow) is decentralised.
I think everybody uses GitHub now because they don't know the email workflow, and it's more convenient to have one account on GitHub than having one account on every instance of GitLab you contribute to. And I guess most people only know the GitHub web interface and don't really feel like using something else.
My point being that there are many reasons why everybody is using GitHub, but it does not mean that the PR workflow is better. What do you think?
All my current customers are using Bitbucket, but it's the same.
They may look like centralized repositories but they are not. We have our own local repositories, sometimes even more than one for the same project, and we use the GitHub or Bitbucket one only to sync some branches between developers.
It's not what used to be with centralized systems. The only copy of the repository was on the server. Locally developers only had files sometimes with a global lock such that nobody could work on the same file at the same time.
That's the real advantage of having a system like git in combination with a server like GitHub.
The PRs are nice but not everybody use them. Sometimes it's only merge and push.
One other distinction I would make between this and the email centric workflow that (almost) every mailing list that I know of these days has is the inclusion of a publicly accessible URL with an index of messages.
The LKML [mirror][] has a page with links to the various lists, but if you dig around in those links, there doesn’t appear to be a way to post new items or reply to existing messages in a browser. It’s just text - The mechanism of action is still sending emails to various handles to perform these activities.
Yeah to me it's a feature. I don't need everything in a browser. In fact I am happier with dedicated programs (an email client, an FTP server, an IRC client, etc).
It means that I can choose a client that works for me. Whereas I cannot choose the UI I want for GitHub.
That is a much worse setup than what you were replying to. I've seen so many issues caused by centralized version control that Git - and yes, GitHub too - will never experience.
Yup. We moved FROM subversion to that setup. Subversion was hosted on a beige box from the previous era, it was slow as
frack and a ticking timebomb.
Initially, after a git-svn period we set up bare git+ssh on it as a direct replacement but we quickly realised it was annoying and suboptimal, and that we could just set the same thing up decentralised, adjusting our habits for the better.
It's so refreshing to use, everything is a git fetch --all away (including tags), e.g no more "I forgot to push before going on holidays" or otherwise surprise sick leave, we'd just power up the machine and not even log in, sshd would start, git fetch, and power down via the login manager or ACPI event.
yes, and it’s worth mentioning the next step evolution after ssh was gitosis/gitolite to manage who has access to what repos on a single ssh entry point, by taking advantage of the fact that multiple authorized keys could be used for one ssh local user.
enter two companies to rip off and monetize gitosis/gitolite, and eventually rewrite them into their own service, and presto, everyone has forgotten that git is both free and decentralized
> What's amazing is that you can also use git locally, without a server, using "git daemon".
I don’t think that’s amazing. What was amazing to me was back when I couldn’t set up a Subversion repository without a “server”.
Someone asks on StackOverflow about why `git push` behaves a certain way? The principle of Laziness dictates that you try to reproduce the issue with two sibling directories where one is a remote of the other one. Like they apparently do in one of Git’s integration tests:
But does it matter? Nowadays even my editor wants to run a server in the background to provide syntax highlighting. People generally don't seem to care and see it as progress.
But what about the price of tea in China? The language servers you refer to provide a decoupled service to other programs. It has literally nothing to do with the fact that I have to (apparently—why would I even use SVN in this day and age) make a dummy repository somewhere hither or tither just so that I can version control my local-only dotfiles!
If you want to use git through a web server, it's one way of doing that. (For example, if you want to use some existing http auth for your users.) You don't particularly need to do that - it's just an option; git over ssh works fine too, as does git with file: local urls (or without "remotes" at all.) Professionally, ssh auth (especially with ControlPersist) has always been the better/faster option, but that's more of a cultural thing.
(My hobby workflow is to just start writing in a directory, and then after a little bit do a "git init"... and then once I have enough to pick a name, I have a "git save-project projectname" script that goes off and does a git init --bare on a homelab server and git push --set-upstream so it's now a remote. Just gradually escalating how "seriously" I'm treating it and therefore how much plumbing I bother with.)
It's important to note that the git binaries are (usually) both client and server - by design what with the hole "decentralised" thing.
So the default package will let you run server functionality such as the http-backend (which allows the machine you run it on to serve http(s):// schema repositories)
Cool, I didn't know about 'git bundle'; nice to have another tool in my arsenal :)
I like to keep a bare copy of each repo locally, and use those as remotes for my "working copies". The `git worktree` command can be used in a similar way, but I feel safer using separate clones.
The article focuses on removable media (USB drives, CDs, etc.) which make automation awkward. If your remotes are more reliable (e.g. on the same machine, a LAN, or indeed the Internet) then git hooks can be useful, e.g. to propagate changes. For example, my local bare repos used hooks to (a) push to remotes on chriswarbo.net, (b) push to backups on github, (c) generate static HTML of the latest HEAD, and copy that to chriswarbo.net and IPFS.
Since the article mentions bundles, a related feature is git's built-in mail support. This can be used to convert commits into a message, and apply a message as a patch. I've used this a lot to e.g. moves files from one project to another (say, helper functions from an application to a library) in a way which preserves their history (thanks to https://stackoverflow.com/a/11426261/884682 )
git is foundational tooling, i.e. one of the tools that a developer must know.
But git is useful beyond the development community. A huge impact for a "weekend project".
"The development of Git began on 3 April 2005. Torvalds announced the project on 6 April and became self-hosting the next day. The first merge of multiple branches took place on 18 April. Torvalds achieved his performance goals; on 29 April, the nascent Git was benchmarked recording patches to the Linux kernel tree at a rate of 6.7 patches per second. On 16 June, Git managed the kernel 2.6.12 release." [1]
Git is foundational because it's a bunch of tools dealing with a very general data structure (the DAG of file versions, or whatever the semantically correct thing to say is).
Docker is something similarly powerful. It wraps around a few things (bunch of kernel namespaces, kinda reproducible, layered image format) and it is useful in many use-cases beyond microservices.
There are few other tools which I can say the same about. `jq` and curl are powerful and ubiquitous. But jq is a language, and curl is a tool for interacting with so many protocols. I don't know if I can put them in the same ballpark as git and docker.
I used a pull request over emaila workflow at my first job ~14 years ago. This is decentralized as email is decentralized. It works quite well, but I prefer the centralized tools.
Linux development is the most public example of this, here's an example from today:
I would be really curious to have your opinion about it as compared to how it was 14 years ago. I feel like SourceHut does a really good job helping on the tooling side (it's super easy to setup a mailing list).
PRs are decentralised. Everyone has their own "fork" of the repo, makes changes there, then tries to convince the maintainer of the "main" repo to pull.
This is exactly what the decentralised model was meant to do. It doesn't mean that every copy of the repo is equal in importance, but they are equal in functionality.
(Or, maybe you mean that it's "centralised" in the sense that it's all on github.com?)
Well the git repo is decentralized (everyone has a copy of it). PRs are not. All the PRs are living on one centralized server (usually github.com), such that if that server is down, nobody has access to the PR.
When you send your patch to a mailing list (with `git send-email`), then that patch is immediately distributed to all the mail servers. This is decentralized: everyone gets a copy of your email (which is the equivalent of a PR in the email workflow).
> Or, maybe you mean that it's "centralised" in the sense that it's all on github.com?
That's the other thing: the PR workflow tends to naturally push towards a monopoly. It is annoying to have to create an account on every GitLab instance under the sun just to send a PR. More and more, if your project is not on GitHub, people will not bother to create an account for you.
With the email workflow, you don't need an account: you just send your email to the mailing list (you don't even have to subscribe to the mailing list) or to whoever you want to send it. This makes it much easier to deal with different servers (contributing to 100 projects on 100 different servers does not makes you create 100 accounts).
I see your point. Do you think some kind of federation could help here? If there were 10-100 different github-like servers all set up to "git pull" from each other, would that make it a decentralised system to you? Even email is not fully decentralised anymore since we need to use a proxy to send email nowadays.
I read that Gitlab is working on some kind of federation of instances, where one could fork a repo from instanceA into instanceB and open a PR from instanceB to instanceA (instead of having to create an account on instanceA and fork the repo there).
I guess it would be good if it helped against this "lock-in" effect that pushes everybody to GitHub. Though I doubt GitHub would ever federate (obviously) and another lock-in factor is that people are used to the GitHub tooling and may be reluctant to use a different WebUI just for a few projects.
But that all sort of feels overly complicated to me, to be honest. Because the email workflow already solves that in a minimal way. I can send patches to any email address I want without creating an account, and I can use my favourite tools everywhere without even needing a browser.
How would that even work without merging all the relevant changes at some point?
If I want to use some open source software, I don't want to connect to every single forked repository to see if they might have local changes that I could need. ... and then dealing with all sorts of merge conflicts.
I remember he mentioned this in speech. he can merge changes from trusted people, and everyone have a personal list of people they trust... so he doesn't need to check everything by himself, and if anything goes wrong, they always can go back to last version.
Of course, at some point you probably want an authoritative main branch from which you make your releases (though there can be forks, too).
I meant more in terms of development. The way git is used most of the time is by having a main branch on GitHub, that contributors clone, work on a feature branch, then make a PR, have QA test it, and merge it. If that cycle is too slow, many times the devs will start complaining because they are "stuck until the branch is merged". Because the assumption is that everybody branches from the main branch.
But git was designed around the email workflow. Instead of a PR, you make a patch (or a group of patches), that you can share with others. For instance, you can share a patch with a colleague from your team, who will review it and incorporate it in their branch before it gets merged into main. That way they are testing it by using it. At some point your changes get merged into the main branch, and your coworkers can just apply their patches on top of it.
We can imagine a workflow with a hierarchy of maintainers: devs at the lower level send patches to their supervisor, who after a while sends them up to their supervisor, up to the main branch.
PRs flatten all that. Devs typically never learn how to deal with patches, and the review process ends up being "rebase on main, run the CI, then I'll skim through your code, I'll complain about some variable names and I'll approve without even pulling your code" (I'm exaggerating a bit, of course).
I think there is a lot of value in learning the git email workflow.
> I meant more in terms of development. The way git is used most of the time is by having a main branch on GitHub, that contributors clone, work on a feature branch, then make a PR, have QA test it, and merge it. If that cycle is too slow, many times the devs will start complaining because they are "stuck until the branch is merged". Because the assumption is that everybody branches from the main branch.
That my VCS is distributed is a goes-without-saying at this point. Nothing else is good enough. But a completely different dimension is integration. And I want to be as integrated as possible. And for everyone else as well. I don’t want three different cliques working on disparate things. And I warn against such things at work as well (i.e., hey let’s make a branch for this project which 3/9 of us is going to work on for a month...).
> But git was designed around the email workflow. Instead of a PR, you make a patch (or a group of patches), that you can share with others. For instance, you can share a patch with a colleague from your team, who will review it and incorporate it in their branch before it gets merged into main. That way they are testing it by using it. At some point your changes get merged into the main branch, and your coworkers can just apply their patches on top of it.
This certainly has merit. I mean peer-to-peer integration branching. But you can do the same with forges and branches. In fact it’s more difficult to keep track of patches via email, i.e. what comes from where, have I included this already, etc. Just consider the conversations that seem to keep coming up on how to deal with identifying patches.[1]
> We can imagine a workflow with a hierarchy of maintainers: devs at the lower level send patches to their supervisor, who after a while sends them up to their supervisor, up to the main branch.
I mean you should expect Whatever The Law about the org hierarchy being reflected in processes, not vice versa. Apparently there isn’t a process for this command hierarchy to be reflected in.
> PRs flatten all that. Devs typically never learn how to deal with patches, and the review process ends up being "rebase on main, run the CI, then I'll skim through your code, I'll complain about some variable names and I'll approve without even pulling your code" (I'm exaggerating a bit, of course).
Patches or not doesn’t really change that. If one single email thread becomes the focal point of a month-long development “PR” then that’s the same thing.
But yes. Getting out of that dang “fork” mindset is good. You can be more loosey goosey via email since you just inline your suggestions as patches (either with a commit message or without).
It would be wonderful if it were distributed by default more easily. For instance, an ipfs or torrent backend which automatically provides a backbone of thousands of computers with the repo on them as the remote, rather than just the single github server.
It was designed around the email workflow, which is distributed by default: you send your patches to a mailing list, which is distributed between all the email servers and clients. And it's easy to host mirrors, too.
GitHub and the PR workflow tend to make it all centralized.
Sure, but the work of typing out 1000 people's IP addresses as remotes and managing that sounds like a nightmare that should be offloaded somehow. Adding "ipfs" to the remote and having it managed by a system to push to thousands of devices (however many have cloned the repo) is much more concise and simple.
> the work of typing out 1000 people's IP addresses as remotes and managing that sounds like a nightmare
Not sure I get that. You do `git send-email --to=<mailing-list-address>` and that's it. Everyone on the mailing list gets a copy of your patch, that they can apply if they want.
Wow, perhaps I am unaware of some of the wonderful capabilities of git!
So if everything on the entirety of github.com was deleted, you could still do a git clone somehow and it would pull it from everyone who has ever cloned that repo?
Because that's what I'm referring to here - p2p not hub-spoke architectures.
I'm fairly certain git doesn't inherently have this feature, unless the backend remote could automatically start an iOS or torrent daemon and deal with torrent or ipfs for pulling and pushing.
I am talking about the collaboration process: sharing patches with collaborators.
I think you are talking about distributing the git server itself over p2p. Which I find less important because I don't think that the server bandwidth is usually a problem. Or is it?
If I wanted to use git on ipfs, I could host a node and push all my git commits there, but I’d have to convince others to host it as well, correct? So perhaps it would be awesome for majorly popular projects, but still a end up having a SPOF for smaller projects right?
The idea is to have the daemon automatically start and serve the git repo anytime a git clone is done, that way the number of nodes hosting the repo (speed of download) grows with the popularity of the repo. That way you could saturate a 10G connection on a popular repo, whereas with github you'll likely be pretty throttled or limited.
The real problem is that even though being distributed is cool and can help to ensure
that your work exists in more than one place either for reasons of collaboration or backup, humans still like to cling to the idea of having a "single source of truth".
So, if Alice collaborates by pull/pushing with Bob, and Bob works by patches over email with Charlie, and David exchanges with Charlie and Alice - then what is the "true" state of the repository exactly?
The easy way out is that everyone pulls/pushes with github and we use that as "truth".
> or all distributed copies have to always be kept in sync somehow
This isn't actually true. The collaborators simply have to ... collaborate. It depends on what the goal is with the data being kept in git. Git doesn't tell you how to collaborate.
It could be that the group elects one person (e.g. Alice) to do releases, so, it stands that only that code which reaches Alice will ever get released. If that doesn't happen, you haven't collaborated correctly.
It could be that any one of the group could release, in that case you collaborate to get your commits to any of those people.
It could be that there are no releases ever made, and the group loosely exchanges their branches to build whatever interests them.
Can you not also push back to your fork and GH guides you to make a PR from your fork/branch to the upstream? IIRC, you don't have to deal with the upstream locally?
Yes. You only need to add the upstream repo if you want to fetch the newest changes. So it is something that you will probably only run into on your second PR.
I wouldn't be surprised if the vast majority of GitHub users never send a second PR to a third-party projects. Probably the majority of users only contribute to their own (or their company's own) repos. Then a small number send at least one pull request to another user's repo. Fewer still would send more than one in a way that requires pulling new changes from upstream.
Great article. Also note that git-bundle can be used to manually transfer a range of commits between two computers. Suppose the sender's repository is at version 10 but the receiver is at version 4. On the sender's side, you can request to create a bundle of versions 5 through 10 and save that as a single file. You can move the file to the receive using whatever method you choose. On the receiver side, you can essentially "git pull" that set of patches. This technique has helped me in quite a few environments.
I'm doing this for private notes, that I don't want on a git hoster. Of course without network delay, everything is super snappy. Only need to make sure, that you have backups, in case one of your disks goes up in flames or so.
Far too many people think of git as a tool to push/pull code from a remote location. A glorified scp basically. Git is a distributed version control system.
I often use local bare repos for experiementation, and for backup to other machines on my lan, in case my main hard drive goes dead or github decides to unexist.
It is also possible to exchange difference between different location using patch file with the git commands 'format-patch' and 'apply'. Patch file are usually a bit smaller and can also be easily mailed.
A minor correction: it's usually preferable to apply commits with `git am` instead of `git apply`, as it applies the commit with all its metadata, not just the diff.
I had to use git bundle at a government contract job where they took over a month to issue my hardware that was able to access their GitHub repo. Pretty convenient actually (compared to whatever the alternatives may be)
I just learned that last year, too, when my last employer did not have any versioning tool whatsoever, except windows network shares (yes, there are more than enough companies that do it that way. It's the second time i encountered such a horror scenario). So I just set up a bare git repository on the network share and used this to keep my project there.
Glad they understood that git does make sense and setup a gitlab server soon after.
Another cool thing that I believe git does in these situations is that it hard-links the blobs in the .git folder if you do a local clone. It makes sense: the blobs are immutable and content-addressed, no need to store two copies! Just have two links to the same file system object, save a bunch of disk space.
I had to set up an entire government team's workflow that was required to be isolated completely from any network. I had to create multiple remotes that were filepaths that pointed to specific USB drives. Depending on which USB drive was connected to their laptop, a developer was able to push and pull any changes to their codebase.
It felt unintuitive at the time but thinking back, this team was able to produce code much faster than other teams that didn't have a similar workflow set up.
Fossil (https://www.fossil-scm.org) is superior for this use case in almost every way I can think of. It was in many ways designed for this use case.
A fossil repository is a single-file SQLite database. You can copy that single file to another computer and treat it like a remote, sync with it, etc. with a simple "fossil sync" command. The single file includes all the ticketing (issues), wiki, discussion forum and all branches, and those are all synced as well. There's no need to do any special packaging or bundling. Plus you get a built in web UI.
I've created a small little bash function for that, and then I just pull from machine to machine (or IP to IP) directly without needing to bother with the internet. Git is very useful this way on hackathons or when there's not much internet bandwidth to begin with.
git-serve() {