Hacker News new | past | comments | ask | show | jobs | submit login

I encountered a large company where they had a private git server for their engineering teams.

Over time someone discovered that the number of repositories and usage was much greater than they expected. What they found was that non engineering folks who had contact with engineering had asked questions about how they manage their code, what branches were, and etc. Some friendly engineering teams had explained, then some capable non engineering employees discovered that the server was open to anyone with a login (as far as creating and managing your own repositories) and capable employees had started using it to manage their own files.

The unexpected users mostly used it on a per user basis (not as a team) as the terminology tripped up / slowed down a lot of non engineering folks, but individuals really liked it.

IT panicked and wanted to lock it down but because engineering owned it ... they just didn't care / nothing was done. They were a cool team.




Unfortunately git does not handle binary files elegantly (unless you use git-lfs). You can inflate storage rapidly by, say, editing a 10M zip file a few times. I've had to GC more than one repo where someone accidentally added an innocuous binary file, and the next thing you know the repo has exceeded 2G of storage space.


> I've had to GC more than one repo where someone accidentally added an innocuous binary file

My god, the things I've seen in repos. vim .swp files. Project documentation kept as Word documents and Excel spreadsheets. Stray core dumps and error logs in random subdirectories, dated to when the repo was still CVS. Binary snapshots of database tables. But the most impressive by far was a repo where someone had managed to commit and push the entirety of their My Documents folder, weighing in at 2.4GB.


If you crawl a package repository such as PyPI, you will find a lot of that same stuff in packages as well. Which is even weirder because those are created from a setup.py which does not have a `git add .` equivalent. People are not good at building clean archives.


I found git-lfs to be a huge pain, since the "public" server implementations are basically github and gitlab. We have plain git repos (via NFS/ssh plus bugzilla hooks), so we either have to use some random-github-user's virtually unmaintained implementation or roll our own - both not the best options. On the other hand, we put our custom built GCCs plus sources into a git, and trust me, having a 8GB repo (after a few version bumps) is really annoying, so having git-lfs would be plain amazing.

(I checked this out the day before I left for vacation, so to be fair, my research might have not been thorough enough to find each and every implementation - but I think it is comprehensive enough to make some preliminary judgement)


Did you try the lfs-test-server?

https://github.com/git-lfs/lfs-test-server


we've got Bitbucket's LFS pointed to our Artifactory server. not the cleanest solution, but haven't had any major problems on over a year.


External hosting is not an option for us ;) The gccs are the biggest pain point, but customer projects plus binaries are the other - and those are just too sensitive to be pushed into someone's cloud.


Bother our Bitbucket and Artifactory instances are internally hosted.


Luckily storage is getting cheaper. I do wish someone hadn't checked a custom-built nginx binary into ours though.


Even with infinite storage, having lots of blobs can make a repo unmanageable. In order to get an 8GB repo onto github, I had to make temporary branches and and push them incrementally.

I highly recommend git-annex. It is like git-lfs but a bit less mature but much more powerful. Especially good if you don't want to set up a centralized lfs server.


Yea, I recommend git-annex too.


It's not just a question of storage, as the size of the repository increases git starts having a hard time dealing.

Binary files don't cause the issue, but because binary files don't deltify / pack well significant use of them makes repos degenerate much faster.


I heard of a web consultancy around 2006 where the Subversion repository history contained a full copy of the Rolling Stones discography in MP3.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: