Hacker News new | past | comments | ask | show | jobs | submit login

Unfortunately git does not handle binary files elegantly (unless you use git-lfs). You can inflate storage rapidly by, say, editing a 10M zip file a few times. I've had to GC more than one repo where someone accidentally added an innocuous binary file, and the next thing you know the repo has exceeded 2G of storage space.



> I've had to GC more than one repo where someone accidentally added an innocuous binary file

My god, the things I've seen in repos. vim .swp files. Project documentation kept as Word documents and Excel spreadsheets. Stray core dumps and error logs in random subdirectories, dated to when the repo was still CVS. Binary snapshots of database tables. But the most impressive by far was a repo where someone had managed to commit and push the entirety of their My Documents folder, weighing in at 2.4GB.


If you crawl a package repository such as PyPI, you will find a lot of that same stuff in packages as well. Which is even weirder because those are created from a setup.py which does not have a `git add .` equivalent. People are not good at building clean archives.


I found git-lfs to be a huge pain, since the "public" server implementations are basically github and gitlab. We have plain git repos (via NFS/ssh plus bugzilla hooks), so we either have to use some random-github-user's virtually unmaintained implementation or roll our own - both not the best options. On the other hand, we put our custom built GCCs plus sources into a git, and trust me, having a 8GB repo (after a few version bumps) is really annoying, so having git-lfs would be plain amazing.

(I checked this out the day before I left for vacation, so to be fair, my research might have not been thorough enough to find each and every implementation - but I think it is comprehensive enough to make some preliminary judgement)


Did you try the lfs-test-server?

https://github.com/git-lfs/lfs-test-server


we've got Bitbucket's LFS pointed to our Artifactory server. not the cleanest solution, but haven't had any major problems on over a year.


External hosting is not an option for us ;) The gccs are the biggest pain point, but customer projects plus binaries are the other - and those are just too sensitive to be pushed into someone's cloud.


Bother our Bitbucket and Artifactory instances are internally hosted.


Luckily storage is getting cheaper. I do wish someone hadn't checked a custom-built nginx binary into ours though.


Even with infinite storage, having lots of blobs can make a repo unmanageable. In order to get an 8GB repo onto github, I had to make temporary branches and and push them incrementally.

I highly recommend git-annex. It is like git-lfs but a bit less mature but much more powerful. Especially good if you don't want to set up a centralized lfs server.


Yea, I recommend git-annex too.


It's not just a question of storage, as the size of the repository increases git starts having a hard time dealing.

Binary files don't cause the issue, but because binary files don't deltify / pack well significant use of them makes repos degenerate much faster.


I heard of a web consultancy around 2006 where the Subversion repository history contained a full copy of the Rolling Stones discography in MP3.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: