Scaling Git, and some back story

sdesol · on Feb 4, 2017

> 1st party == 3rd party

This was actually the first thing that I actually noticed about Visual Studio Team Services, when I first looked at integrating my search and code analytics engine with VSTS. It was quite apparent that they wanted to make 3rd party developers, first class citizens.

Anybody who has ever worked in Enterprise, knows feature requirements are heavily driven by politics. And if you can't support the weirdest edge cases, resistance for adoption can become insurmountable. Having looked at VSTS, you could easily tell they wanted to reduce as much push back as possible.

daxfohl · on Feb 5, 2017

It's a lofty goal to create tools with the view of supporting the direction of one of the world's largest software companies, with the ability to support single-person dev shops just as seamlessly. I don't know if it makes sense (i.e. would google's monorepo scale down like that? Is Microsoft hamstringing themselves in this way?), but I applaud the effort.

contextfree · on Feb 5, 2017

Related backstory/POV from one of the lead developers behind this effort who's now outside MSFT in this tweet thread: https://twitter.com/xjoeduffyx/status/827633982116212736

luckydude · on Feb 5, 2017

I think they sort of gave up too soon on splitting up their repos. We've been through this before and made BitKeeper support a workflow where you can start with a monolithic repo, have ongoing development in it, and have another "could" of split up repos, sort of like submodules except with full on DSCM semantics.

Might take a look at section 5 of this:

http://mcvoy.com/lm/bkdocs/productline.pdf

which has some Git vs BK performance numbers. We actually made BK pretty pleasant in large repos even over NFS (which has to be slower than NTFS, right?).

And BK is open source under the Apache 2 license so there are no licensing issues.

I get it, Git won, clearly. But it's a shame that it did, the world gave up a lot for that "win".

protomok · on Feb 5, 2017

Great to see MS working on this, and also posting the code!

"As a side effect, this approach also has some very nice characteristics for large binary files. It doesn’t extend Git with a new mechanism like LFS does, no turds, etc. It allows you to treat large binary files like any other file but it only downloads the blobs you actually ever touch."

It seems every day I see another attempt to scale Git to support storage of large files. IMHO lack of large file support is the Achilles Heel of git. So far I am somewhat happy with Git LFS despite some pretty serious limitations - mainly the damage a user who doesn't have Git LFS installed can inflict when they push a binary file to a repo.

I'm curious what other folks on HN use to store large files in Git without allowing duplication?

herbst · on Feb 5, 2017

I yet have to run into a issue where i even want a large file in git.

bluejekyll · on Feb 5, 2017

The common case that I see are binaries that are being versioned through the VCS. Large binaries, be they libraries or the application, are stored in the VCS and it is used as the source of truth from that point on.

Git explicitly called this out as a bad practice. Other vendors, like Perforce, never really did an amazing job with it, but it worked and on top of that, created more reliance on the vendor's system. Now that most people see the shear productivity gains of Git over centralized VCS systems, everyone wants to move, but that comes with a catch. Many of these companies have large workflows built around the way that their old VCS works, and even have compliance rules that have that methodology written into things like their SOX compliance regulations.

It's this set of people, that generally have this issue. For everyone else, using just boring old disks is fine for packages and built product as that can be recreated from the SCM. Now, with LFS in Git, you can maintain the same workflow as you had in your old VCS, without changing the entire structure of the organization to work with it.

luckydude · on Feb 6, 2017

BitKeeper did stuff like LFS except you can have more than one binary server:

http://www.mcvoy.com/lm/bkdocs/HOWTO-BAM.html

bostand · on Feb 4, 2017

The sooner they admit TFS is dead and commit 100% to git the better.

In fact, I think everyone should use git :)

chokolad · on Feb 4, 2017

TFS supports git. TFS is a lot more than just source control.

justanotheratom · on Feb 4, 2017

I think you mean Visual Source Safe ;)

D_Guidi · on Feb 4, 2017

I think he means Team Foundation Version Control

quanticle · on Feb 5, 2017

Related discussion: https://news.ycombinator.com/item?id=13559662

This article covers the end-to-end approach, whereas the other article and discussion are more focused on the GVSF filesystem driver used to support scaling git to repositories with hundreds of thousands of files and hundreds of gigabytes of history.

microcolonel · on Feb 5, 2017

Good story. It'd be interesting to see a portable version (which I guess would have to either run on Mono or be rewritten in something else); or maybe Google will release some of theirs. I'm impressed that Microsoft had the courage to scale mostly-vanilla git instead of hacking Mercurial.

sytse · on Feb 5, 2017

During the presentation at Git Merge Microsoft mentioned that the are hiring Linux and osx driver experts. This suggests that they plan to release the fuse driver themselves.

sigmonsays · on Feb 5, 2017

these articles are less genuine and interesting because it is the same person with the same theme https://news.ycombinator.com/submitted?id=dstaheli

grzm · on Feb 5, 2017

Its likely just a topic of interest. The frequency of submitting is not very high. Judge the piece on its merits.