>> What we find is that many of the repositories that tax our servers the most are not unusually big. The most challenging repositories to host are often those that have an unusual internal layout that Git is not optimized for.
Those who only know Linus from his rants might be surprised that here "his usual fashion" means:
- Acknowledging that the root cause was Github's documentation being misleading.
- Not blaming the contributor for being mislead by Github: "I can see why that documentation would make you think it's the right thing to do."
- Admit that the ease with which the accident happened is a deficiency in Git's UI.
- CC the Git maintainer to discuss improving Git to make it harder to do this by accident. (Which eventually lead to the --allow-unrelated-histories flag being needed to do this kind of merge.)
The Linux kernel has been developed over 25 years by thousands of contributors, so it is not at all alarming that it has grown to 1.5 GB. But if your weekend class assignment is already 1.5 GB, that’s probably a strong hint that you could be using Git more effectively!
Git is only 12 years old, how does Linux have 25 years of history there? As far as I know Linux used patches on mailing lists before git, are those also somehow transferred to the repo?
The first link describes using git's "grafts" feature to make the UI believe that the first commit in the normal repo actually has parents, which means you can use the repo normally and agree with everyone else about commit numbers, but also `git log` will go all the way back to Linux 0.0.1. I had this setup on my work machine in 2012 and it was useful a few times, but in the last couple of years I haven't really needed to see history past 2.6.12.
(But yes, the history repos don't explain the size of the normal linux.git repo - except to the extent that you need to spend over a decade writing an OS to get even that many lines of code in the first commit and that much activity shortly thereafter.)
commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 (tag: refs/tags/v2.6.12-rc2)
Author: Linus Torvalds <torvalds@ppc970.osdl.org>
Date: Sat Apr 16 15:20:36 2005 -0700
Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!
Actually that's what VCSs before git used to be and what git changed. Git doesn't keep patches, it keeps full states of the repository in a content addressable fashion. It's one of its key insights. Instead of having to have an always correct way to encode deltas just encode the state itself and leave it to the tools to figure out what the diff should be. That way you're not encoding in your disk format something that can be done better in a later version of the tool.
That said, git doesn't just store direct copies either. It will bundle things up into packfiles as it calls them to do compression and encoding of various forms to reduce disk space and make it quicker to find a given version of a file
I noticed just today that Github has a number of counter-measures for absurd git repositories built in when you try to push something. For instance, I imported a huge (3GB, mostly due to large frequently updated files in the history) subversion repository to git and got failures due to individual commits exceeding 100MB. This was quite helpful to bring the size of my repository to a reasonable state. Tools like the https://rtyley.github.io/bfg-repo-cleaner/ are indispensable to do this kind of filtering without headache.
While they're at it why not: dpkg, docker, entropy, flatpak, guix, ipkg, netpkg, opkg, pkgng, pacman, rmp, snappy?
It's better to leave packaging to each distro's maintainers rather than spending 80% of your time preparing the release packaging for every single package manager there is. Or super keen folks who want to do it specifically for your project, even then they'll only be super keen about one or two platforms.
1. Releasing a new binary tool without any package manager support just sucks for your users in general, because it means they're required to manually install it and most of them will probably end up with a horribly-outdated version of your tool installed for a long time because their package manager can't ever tell them that it's out of date.
2. macOS isn't a distro, so you can't just say "let your distro maintainers do it". If you don't submit to MacPorts, the only way you'll get in there is if someone else steps up to submit on your behalf, but that kinda sucks because you're package will likely end up out-of-date in MacPorts unless the volunteer maintainer is super diligent about noticing new releases and updating the Portfile.
Nix is a more general-purpose packaging system, but it also suffers from this problem. In fact, in my experience, Nix packages do tend to be out of date for a while before someone notices and fixes it.
FWIW I don't really expect people to actually submit their own tools to Nix anyway, because there's a fairly steep learning curve there, but it would be really awesome if people did. But submitting to MacPorts is more straightforward.
2. macOS isn't a distro, so you can't just say "let your distro maintainers do it". If you don't submit to MacPorts, the only way you'll get in there is if someone else steps up to submit on your behalf, but that kinda sucks because you're package will likely end up out-of-date in MacPorts unless the volunteer maintainer is super diligent about noticing new releases and updating the Portfile.
So by that logic I have to support mac/nix over every other system by default as they don't have maintainers? That sounds like a mac/nixos problem, not a developer problem.
If you want your tool to actually get used, you should put in at least a little effort towards trying to get it in package managers. I don't know why you're acting so surprised about that.
I've always wondered why GitHub doesn't display the size of files at the folder level? the only way on the website is to drill down to the individual file.
Like CocoaPods!
https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm...