Abandon your DVCS and return to sanity

iand675 · on March 3, 2015

Anyone else remember the days when having your SVN server go down meant that you more or less had to wait until the server came back up before you could get back to doing things? (if you wanted a nice atomic commit). Let's not forget people that have to travel a lot for work– SVN + airplanes is a match made in hell.

Anyone else ever had someone delete or close an open source project that you cared about? Finding a copy with the history that mattered to you could be tricky.

Anyone else ever needed their own fork of something due to differing goals from the parent project, but needed to fold in upstream changes relatively often? That's by nature the sort of problem that a DVCS solves easily.

Other objections to this article:

* Nothing precludes you from using the patch model with DVCS (I mean, Linux kernel development uses Git just fine with this)

* The author mentions that you have to retain the whole history of a project. For one thing, storage is cheap. Another point worth mentioning is that you can make shallow clones with Git. I don't know what the status is for committing to them these days, but there's nothing fundamental that should prevent a DVCS from letting you work on shallow clones if space is such a big deal.

I could go on, but the article seems to be griping about UX issues. Just because we haven't had tools in this space with wide adoption that are user friendly doesn't mean that we won't eventually.

Xylakant · on March 3, 2015

> Anyone else remember the days when having your SVN server go down meant that you more or less had to wait until the server came back up before you could get back to doing things? (if you wanted a nice atomic commit).

That's now known as "I can't deploy because github is down". Note that the author proposes "local commits plus a centralized storage".

> The author mentions that you have to retain the whole history of a project. For one thing, storage is cheap.

Not in portable computers. I can get a TB, but that's it. I have build artifacts that clock in at about 300 - 500 MB and I'd version control them if possible. I can't, because that would fill my disk within a couple of month, so I have to push them to a server and somehow link them.

> Anyone else ever needed their own fork of something due to differing goals from the parent project, but needed to fold in upstream changes relatively often? That's by nature the sort of problem that a DVCS solves easily.

That's a strawman. The article does not argue that there's no use-case that's solved by git or and DVCS. It's just that not every use-case is solved by git. I'd even go out and argue that most use-cases are solved just as good by a solid centralized system with less complexity involved.

> Just because we haven't had tools in this space with wide adoption that are user friendly doesn't mean that we won't eventually.

I see a developer wedge his git repo with a pull + rebase about once a month. And then somebody needs to walk over and explain. DVCS fundamentally introduce complexity that is not always needed. And I doubt that the fundamental complexity can be abstracted away.

jimktrains2 · on March 3, 2015

> That's now known as "I can't deploy because github is down". Note that the author proposes "local commits plus a centralized storage".

And in that cause you can scp your repo somewhere and change your origin and move on.

Also, do people really rely on 3rd party services for their deployed code? I've used github as a repo and colab tool at companies, but we still deployed from repos owned by the company.

> I'd even go out and argue that most use-cases are solved just as good by a solid centralized system with less complexity involved.

I'd go out an argue that because the added complexity of git is far outweighed by the benefits of a local repo.

mordocai · on March 3, 2015

Hell, for a ton of companies (including mine) everything is on 3rd party services/servers. We don't have any servers we can actually call our own.

kbenson · on March 3, 2015

Which is fine, but the very obvious and up-front trade-off of using servers that aren't yours is that they aren't yours, and everything that implies. There are plenty of benefits as well, but that's why it's a trade-off.

teej · on March 3, 2015

    Also, do people really rely on 3rd party services for their deployed code?

Yes, this is very common.

gfodor · on March 3, 2015

Also the fact that each developer has a full copy of the repo on their laptop lets me sleep at night wrt github exploding.

TeMPOraL · on March 3, 2015

Or your company exploding or pretty much anything exploding. When every dev has a full copy of the repo, you can restore all the work as long as a single dev machine is alive.

Not to mention that in case of an office Internet outage, you can just mesh together those repos and continue working. Knowing that gives me some peace of mind.

pyre · on March 3, 2015

Cloning doesn't require the full repo. Some of your devs may only have half of the repo. Look at the --depth option for git-clone.

silverbax88 · on March 3, 2015

Or, you could use Mercurial and sleep even better knowing that not only does every dev have a full copy, but does NOT have thousands of orphaned files of each revision or potential revision like Git does.

Xylakant · on March 3, 2015

It also means that you get to loose control over all data in the repo including all versions and history when a single person looses the laptop, even if it's a person that only required read access to a tiny part of the data.

It's a valid tradeoff to make, but still something I'd keep in mind.

angersock · on March 3, 2015

I'm going to go ahead and say that that's a red herring.

A successful business is built upon so much more than whatever code is in your repo.

Any competitors are probably more likely to look at the code and ignore it over their own home-grown solution than to adopt it wholesale...and if they can beat you with your own code, they would've beaten you without it, because they probably run a much better business game.

Trade secrets are like the stupidest reason ever to hobble your developers.

Xylakant · on March 3, 2015

I think you're wrong on that account, there's tons of pain that can be caused by such a loss. But I was thinking more about the SSL certificates and SSH keys and hardcoded passwords and other secrets that inevitably seem to end up in repositories. (Rails cookie secrets anyone?) Some of that data gets deleted but not purged correctly and oops, there's that one server that still accepts last years key, didn't we delete that from the repo? I've seen some repos that could easily create damage in the 6-7 figure range if they ended up in the wrong hands. Enough to sink a small company in any case.

wtbob · on March 3, 2015

All of that is configuration, not code, any should never be in any repo external to one's company. Nor, frankly, should it be available to the developers. That's all production-specific, and should be restricted to deployment.

If your developers are deploying into production directly from their laptops, you have a problem.

Xylakant · on March 3, 2015

So, admins never use version control for puppet manifests/chef cookbooks/terraform/saltstack/whatnot and all the config data that's put on the servers? And it would not be of value if we could version the configuration along in the same repository as the code to which it belongs? So it's easier to track? Now, currently we can't, because DVCS don't allow us to grant different privileges to different people/groups, but I could very well imagine that developers get access to a repo where they can change the configuration to the dev/test environments and the admins match that in the same repo for prod and other envs with higher level needs. I could also imagine building the final deployment from a single repo instead of having to merge two repos because I need to separate them due to permissions.

I could also imagine designers placing their assets in the repository without needing to deal with git or even seeing the source code that lives side-by-side. That totally used to work with SVN I'd totally like it if the repository contained the PSD sources for all the assets/mockups that are relevant for a given version of the source, but alas, you can't check out the source without getting the PSDs, even if you're not interested in them. Heck, SVN even allowed exposing the repo as webdav which you could mount as a networked folder, so people could access the last revision just as if it were a SMB drive.

git is a huge step forward in some regards, but we also lost quite a couple of good things along the way.

> If your developers are deploying into production directly from their laptops, you have a problem.

I never said they do.

angersock · on March 3, 2015

So, the performance of binary assets in SVN is exactly why we used to--when doing game development--check all the source files out into their own repository, and occasionally sync over.

As for the configuration being in the same repo as the code--again, why? That tells me that you aren't packaging releases, or if you are, your documentation is less useful as a reference than your code.

These are antipatterns, again, and "fixing" your VCS isn't going to help in the long run.

Xylakant · on March 3, 2015

> So, the performance of binary assets in SVN is exactly why we used to--when doing game development--check all the source files out into their own repository, and occasionally sync over.

The assets for game development likely are bigger than the assets in web development but we never had major issues with binaries. Things also improved much in later SVN versions (in the beginning the client was way to stupid and tried to diff binaries which obviously could not work)

> As for the configuration being in the same repo as the code--again, why?

Why not? I don't have that right now, but I'd like to have it. Because I could compare the state of the code and the state of the configuration with the state of the code easily at any time at any commit. I could use bisect to figure out at which point things started failing. Helps when you're trying to figure out when config and code started drifting apart. I can use google's repo tool or submodules or stitch together the config repo and the code repo by matching up dates or tagged versions, but that's all a hack. It's maybe the best hack currently available, but I still think there's room for improvement.

> That tells me that you aren't packaging releases, or if you are, your documentation is less useful as a reference than your code.

I can't imagine how you reach that conclusion. You could not be further from reality. Just because I'd consider it nice if I could have a single version identifier in a single repo that matches up both, config and code I'm not rolling that in a package? And how does that relate to my documentation?

wtbob · on March 3, 2015

> So, admins never use version control for puppet manifests/chef cookbooks/terraform/saltstack/whatnot and all the config data that's put on the servers?

Of course they do. But those repos should live on an internal repo server, and on the production systems themselves.

Copying production authentication credentials to a non-production machine should be a firing offence.

> And it would not be of value if we could version the configuration along in the same repository as the code to which it belongs? So it's easier to track?

Hell no. Code without configuration is like a gun without bullets: it's an interesting piece of work, but that's it (I'm not a huge believer in intellectual property, you can tell). But configuration is the keys to the kingdom. If a cracker has your app's code, he might be able to figure out some flaws in your protocols or your implementation, but if a cracker has your database passwords, your hostnames and IP addresses, your firewall configuration, your bastion hosts—then you're dead.

Xylakant · on March 4, 2015

Enlighten me, I'm seriously interested in knowing, because I'm missing the point:

Currently there's code in one repo and configuration in another. Both repos are on the same internal repo server. Both repos are accessible to multiple persons - some can access the code only (developers) and some can access the configuration repo only and some can access both repos (admin). A build process tags both repositories and builds an artifact that gets deployed.

How is that situation superior in terms of security over:

Code and configuration live side by side in the same repository that supports access controls. The repository is hosted on an internal repo server. The parts that are code are accessible to developers only and admins can access both parts. A build process tags the repository and builds an artifact that gets deployed.

The only point I could see is that with two repositories it's harder to mess up the authentication, but I doubt that's true. In both scenarios we have people with access to the configuration and those people will in a lot of organization have a copy on their laptop that they carry around. That's how most people use version control. In both setups it would be possible to only ever handle the sensitive data on a remote system, but that's a property of the workflow and not a property of the VCS used. I seriously don't see the issue with a shared repo - if it supports access controls.

gnaffle · on March 3, 2015

Well, you do that anyway when you allow people to check out a local copy of the code. Just as in most VCSes, you can set up a git server to only allow checkout of specific branches.

Xylakant · on March 3, 2015

>Well, you do that anyway when you allow people to check out a local copy of the code.

You grant that permission to the person legitimately checking out code, but not to the person finding or stealing a laptop with a clone of a repository. The latter is a side-effect of how a DVCS works. In SVN you don't even need to expose the full history, you can grant access to the last revision only.

> Just as in most VCSes, you can set up a git server to only allow checkout of specific branches

In SVN for example you can restrict people to single directories (or even files - I don't remember exactly). That at least is impossible in git. I can prevent pushes using hooks but not reads.

gnaffle · on March 3, 2015

> You grant that permission to the person legitimately checking out code, but not to the person finding or stealing a laptop with a clone of a repository. The latter is a side-effect of how a DVCS works.

I'm not sure what you're getting at. What difference is there (not that you would allow checkouts on unencrypted laptops anyway)?

> In SVN you don't even need to expose the full history, you can grant access to the last revision only.

> In SVN for example you can restrict people to single directories (or even files - I don't remember exactly). That at least is impossible in git. I can prevent pushes using hooks but not reads.

These restrictions may be useful in some cases, but I would wager that they are far more seldom than some of the advantages of git (like being able to work offline).

Xylakant · on March 3, 2015

> I'm not sure what you're getting at. What difference is there

A checkout from SVN/CVS only contains the last version. Files that were deleted in an earlier version are only on the server. A clone of a DVCS contains all versions and all files that ever were in the repo (unless you use BFG or git-filter-branch, but people tend to forget that). So a clone can contain secrets that people are not aware of, such as accidentally committed and deleted files. An interested party could find stuff that you're not aware off by looking at HEAD.

> (not that you would allow checkouts on unencrypted laptops anyway)?

That's not my call to make, but I agree on that regard. Reality sadly different from what we both wish.

gnaffle · on March 3, 2015

> An interested party could find stuff that you're not aware off by looking at HEAD.

Well, that goes without saying. But I don't think that security argument is a very poor one compared to the huge benefit of having the history locally to inspect.

We've had instances where secrets were committed to local repositories by accident. It never got past review and into the master branch. If it had, we would probably had taken the effort to rewrite that commit out of the history.

Xylakant · on March 3, 2015

> Well, that goes without saying. But I don't think that security argument is a very poor one compared to the huge benefit of having the history locally to inspect.

If you go further upthread you'll find that I said "a valid tradeoff, but one I'd keep in mind"

> We've had instances where secrets were committed to local repositories by accident.

That's laudable, but countless examples show that not everyone is that diligent. I'd love if I could lock down some parts of some repos so that they're only accessible by people that I have an elevated level of trust in. (and where I can enforce a certain security level on the laptop).

gnaffle · on March 3, 2015

> That's laudable, but countless examples show that not everyone is that diligent.

Sure, then again I would guess that the ones who are not that diligent are not likely to apply those access restrictions that you mention (although the "one revision" advantage is something they would get "for free" with SVN).

Xylakant · on March 3, 2015

In any given larger organization there are people that have exert control over only parts of the whole. I could possibly argue to tighten down security on parts of a repository for some people within boundaries (like declare some folders as unreadable to some folks that don't need access to those) but I can't deny them all access since they need some of the content stored in the repo. With git that's currently all or nothing which exposes a flank that I'd prefer closed. In this particular case it's not a terrible issue, but for other folks with other data that can quite well be, so the tradeoffs may end up being in favor of SVN. I can imagine that that's one of the reasons I still see SVN deployed in corporate installations.

stcredzero · on March 3, 2015

I'd go out an argue that because the added complexity of git is far outweighed by the benefits of a local repo.

But for precisely what use cases or in what situations? If one is a novice and just want to have source control and versioning, centralized systems are going to have the better cost/benefit.

sanderjd · on March 3, 2015

> If one is a novice and just want to have source control and versioning, centralized systems are going to have the better cost/benefit.

This seems completely backwards to me. When I was in college, I thought it was a giant pain to source control my homework, because the options I was aware of (since all I knew about was SVN) were 1) public hosting on sourceforge (which was itself a pain to use), which professors were not keen on, 2) find a private host somewhere (not sure I ever did), 3) run my own server somewhere, 4) configure and run a local server. I went with (3) and (4) but it was never an easy or novice-friendly solution, and mostly I just didn't version control things. The local version control model of DVCS would have been much easier.

It's only when working with other people that any extra complexity even rears its head at all, which doesn't come up if you're a novice who just wants to have source control and versioning.

Xylakant · on March 3, 2015

a purely local svn repository never needed a server.

svnadmin </path/to/repo/on/local/disk> followed by svn checkout <file:///path/to/repo/on/local/disk> is all it ever needed.

joeyo · on March 3, 2015

Same for git, though.

Xylakant · on March 3, 2015

> And in that cause you can scp your repo somewhere and change your origin and move on.

If you happen to have - god forbid - a php app that pulls dependencies via composer you have a hard dependency on github since composer pulls practically all code from GH. I don't consider that a good idea, but that's how it currently is. [No, I don't do php and only sometimes have to clean up the resulting fallout]

structural · on March 3, 2015

If particular frameworks want to tie themselves to a single server/host, that's their problem, and is specifically not a problem with any of Github, git or DVCSes generally.

You really cannot stop all people from being idiots all of the time.

ksenzee · on March 3, 2015

That's not a hard dependency, it's a sensible default that you should be overriding if you're running composer install as part of your deployment process instead of including composer's vendor directory in your repo. Documentation: https://getcomposer.org/doc/05-repositories.md

Xylakant · on March 3, 2015

Practically all packages use git sources. Yes, you can vendor them - congratulations, you just kicked the can down the road and moved the issue to the build step. The issue exists for other languages as well - maven/mavencentral, ruby/rubygems.org - but only composer depends on github very much. I don't like that but it's not like the other solutions are much better.

mercurial · on March 3, 2015

Many companies mirror maven, so having mavencentral go down is not a big issue. Does composer make it possible to deploy a mirror of its dependencies?

ksenzee · on March 3, 2015

Yes, certainly: https://getcomposer.org/doc/articles/handling-private-packag...

Xylakant · on March 3, 2015

You can use your own repository, so at least in theory that should be possible. I haven't done that in practice, so I can't comment any further.

erikb · on March 3, 2015

I would bet its dependency is git not github.

iand675 · on March 3, 2015

> Not in portable computers. I can get a TB, but that's it. I have build artifacts that clock in at about 300 - 500 MB and I'd version control them if possible. I can't, because that would fill my disk within a couple of month, so I have to push them to a server and somehow link them.

git-annex is a pretty good solution for this: https://git-annex.branchable.com/

> That's a strawman. The article does not argue that there's no use-case that's solved by git or and DVCS. It's just that not every use-case is solved by git. I'd even go out and argue that most use-cases are solved just as good by a solid centralized system with less complexity involved.

I'm not convinced that it is a strawman– I'm not an uber-developer, but I've had to do it reasonably often. A common case is that the originator loses interest, and you still need to do maintenance on it. I'd rather my go-to choice of a version control system support that notion rather than learn a different tool to deal with this case.

> I see a developer wedge his git repo with a pull + rebase about once a month. And then somebody needs to walk over and explain. DVCS fundamentally introduce complexity that is not always needed. And I doubt that the fundamental complexity can be abstracted away.

Sure, there is an additional level of abstraction, but to argue that it's insurmountable seems rather pessimistic. Most technologies that we take for granted now required multiple decades to get to a consumer-friendly state. This is perhaps a philosophical difference that only time will answer.

namdnay · on March 3, 2015

I'm not a fan of solutions like git-annex, every time I've seen people try to do this type of stuff, two years down the line they end up having built a buggy knockoff of maven/gradle. Why not do things cleanly from the start? The entry barrier is really not that high, and the benefits are huge.

Then again maybe I'm a bit psycho-rigid on build management :)

stsp · on March 3, 2015

You're absolute right from my experience. I"ve seen many shops using svn:externals to tie build dependencies together. And now that git is starting to see wider adoption within industry (not just software companies) the same is attempted with git submodules and git annex. It's horrible because this is not the use case these features were designed for, but people try anyway and eventually run into road blocks.

This tends to happen especially in shops that migrate off process-heavy monsters like clearcase and MKS. These tools do a lot more than just version control, and no open source version control system by itself matches what users of these systems expect. You have to throw at least an issue tracker and a good build tool into the mix as well.

sytse · on March 3, 2015

Totally agree, big organizations need something integrated. With GitLab we include the issue tracker and build tool you mentioned as well as having git-annex for large binaries.

Xylakant · on March 3, 2015

> The entry barrier is really not that high, and the benefits are huge.

Because of "we'll just add a submodule and we don't need a build system and it's only a single one and we'll replace that with a proper build tooling as soon as we have time". And then it's only a second one and suddenly submodules pop up like mushrooms after a light summer rain. It requires planning and forethought, just like a lot of things that should be an obvious huge benefit, but planning and forethought are in short supply in this industry.

jrochkind1 · on March 3, 2015

What are you suggesting the right way from the start is? Just using maven/gradle?

namdnay · on March 4, 2015

A million times yes, if only for the dependency management side. Can you remember the dark ages of having to download libraries from the internet and check them into your source? Arghhhhh

_wiv7 · on March 3, 2015

git-annex is a tool for versioning files with git that you don't want to actually store IN git. It is not a build automation system.

hosh · on March 3, 2015

Have you checked out ipfs?

kbenson · on March 3, 2015

That's now known as "I can't deploy because github is down".

git != github. I use git every day at work. I don't use github for that repo. If I did, I wouldn't make reliance on it part of my deployment strategy. If I did do so, I hope I would realize that any problems relying on github resulted in were ones I created with that choice. Hopefully there would be some useful benefits from that choice as well.

deeviant · on March 3, 2015

> Not in portable computers. I can get a TB, but that's it. I have build artifacts that clock in at about 300 - 500 MB and I'd version control them if possible. I can't, because that would fill my disk within a couple of month, so I have to push them to a server and somehow link them.

Why in the world would you check in build artifacts into source control, especially those that clock in at 300-500 MB?

Crito · on March 3, 2015

Because you are accustomed to centralized version control systems that do not punish you for doing something that dumb, and refuse to learn new tools when you are forced to use them.

Anyone checking in 500MB artifacts into git is almost certainly refusing to use git correctly.

It is like somebody who grows up using hand tools, gets handed a power drill, then tries to use it like a pry-bar. You should not use a screwdriver as a pry-bar, that is an abuse of the tool. Nevertheless, many people abuse screwdrivers as pry-bars because conventional hand screwdrivers tolerate this practice.

Xylakant · on March 3, 2015

Thanks for assuming I'm an idiot without knowing the constraint I'm trying to fulfill ;)

erikb · on March 3, 2015

After a few days please review that comment of yours. Could anybody in an internet conversation really know all details of your constraints?

Maybe, maybe(!) there is an argument were you would need to version huge binaries which you could generate out of the same already versioned sources, but even if there is, there are methods to backup huge blobs, and git is simply not one of them.

Xylakant · on March 3, 2015

> Could anybody in an internet conversation really know all details of your constraints?

No - nor do they need to. But everybody on the internet could just assume that _I_ know them and refrain from saying that "I'm doing something dumb" and "refuse to learn new tools". So I'd also ask you to review the comment of yours - you're doing the same thing - you imply that "Maybe, maybe(!)" there might be a use-case, so as a matter of fact you doubt I did review and choose my tools.

I reviewed my comment, and I'm at peace with it.

s73v3r · on March 3, 2015

In that case, why did the parent poster assume anyone trying to check in a large binary file is "an idiot"?

erikb · on March 4, 2015

The "idiot" part of course is wrong. But if you understand a technology and then use it for something it is explicitly bad at, than it's objectively a bad idea. The chance is very low that you would not be able to solve a binary sharing/backup problem with another tool that's made for these tasks.

Crito · on March 3, 2015

> "almost certainly"

You might have a legitimate reason for putting 500MB artifacts into a git repo, but I reeeaaally doubt it.

It is a poor craftsman who blames tools that he is intent on misusing.

Xylakant · on March 3, 2015

You're putting words in my mouth that I did never say. I'm not blaming git for anything nor do I consider git a bad tool. I'm using git for practically all my versioned data. I just don't consider it the tool that needs to solve all cases where I - legitimately or not - want to version data. A good craftsman should have more tools at his disposal than just a blunt hammer.

So you can doubt my use case but I consider it severely impolite to pretend that you know better. Actually you're quite nicely illustrating one of the points in the article.

wtallis · on March 3, 2015

And you're still not even trying to offer a real explanation, just engaging in a flamewar.

Xylakant · on March 3, 2015

It's all discussed further down in the tread, and I gave as much info as I could. I'm not at liberty to discuss details in public - nor do I need to.

Further down on the page somebody else also mentions a legitimate use case to version-control large binaries (needed for comparison). Another use-case I've seen is version control rendered video output and keep the comments and metadata attached to the versions. Works just fine with SVN, fails hard in git. Yet another use case for a system that handles binaries better is what the rubygems folks do - they vendor all gems that a particular version of rubygems.org depends on so they can bootstrap without rubygems.org being available. They built a custom solution using multiple git repos which works for their use-case (it's been discussed on rubygems). Arguably, having a system where versioning large amounts of binary data works better than in git would have prevented that issue.

So there are use-cases that are ill-suited for what git can and cannot do - and just because I say I have one I get to be called a bad craftsman by someone who doesn't know the least bit about what I'm trying to do?

And now you're saying that I engage in a flamewar when I point out that I consider that an insult? Please note that I have not insulted Crito.

s73v3r · on March 3, 2015

The real explanation is that someone would want to version binary data. Like say, images. The parent poster claimed that anyone who would want to version binary data is an idiot.

sgustard · on March 3, 2015

The original article addresses this.

"These are large, opaque files that, while not code, are nevertheless an integral part of your program, and need to be versioned alongside the code if you want to have a meaningful representation of what it takes to build your program at any given point."

pyre · on March 3, 2015

In this case, git isn't the right tool for the job. Large gaming companies, IIRC, use Perforce for this reason.

lilyball · on March 3, 2015

> I see a developer wedge his git repo with a pull + rebase about once a month. And then somebody needs to walk over and explain. DVCS fundamentally introduce complexity that is not always needed.

Rebase and DVCS are orthogonal concepts. Rebase is a consequence of offline commits, not of decentralization. And I greatly prefer dealing with a `git rebase` issue than the equivalent in Subversion, which is to have your uncommitted local changes get irrevocably modified by the `svn update`, with conflict markers placed in your code and no way to abort the whole process.

pyre · on March 3, 2015

But... but... but... git bad!

lilyball · on March 3, 2015

It's a prevalent meme, isn't it? I'm surprised at how many people never even realized how fundamentally dangerous it was to issue an `svn update` with local changes. And perhaps less seriously but still problematic, how `svn commit` can put the repository into a state that never existed on any developer's machine (because it effectively does a rebase on the server), which is not always safe to do even when there's no merge conflicts.

Xylakant · on March 3, 2015

> how `svn commit` can put the repository into a state that never existed on any developer's machine

git add -p creates the same issue for git (a commit that never existed on disk) and git rebase with squash and/or reodering of commits does as well. That's usually considered best practice.

The problem I often observe with git (pull -)rebase is that people have a hard time wrapping their mind around the fact that all commits change and when they change and in what state the checkout is are when they get a merge conflict (remote HEAD plus whatever was already applied). The conflict is often unavoidable and you can argue that rebase is making it easier to resolve the conflict, but it's harder to reason about than "ok, update does a merge and I get whatever is on the server plus my local changes".

lilyball · on March 4, 2015

The difference between `svn commit` and the various forms of `git add` or rebasing is that the developer can test the commits (or visually inspect them) before pushing. In fact, the argument that `git add -p` produces a commit that didn't exist on disk is exactly the same as the argument that developers can commit code that they never actually built & tested. Which is to say, the important point isn't that the commit tree existed in isolation on the developer's disk, but rather that the developer did any testing necessary to verify that the commit is good. And the reason why `svn commit` is bad in this regard is because it creates the never-before-seen result on the server.

> The conflict is often unavoidable

If the conflict is unavoidable, then you're going to get a conflict regardless of how you go about it (rebase vs merge vs `svn update`). The difference is with `svn update` you can't abort the whole process and start over if you need to, because you've already permanently lost the previous state of your work tree. Whereas with `git rebase` and `git merge`, you can abort and you'll get the exact state you had prior to the command.

angersock · on March 3, 2015

Why are you putting build artifacts into source control?

Xylakant · on March 3, 2015

Because they're intermediary steps of a process, regenerating them takes an hour and I don't feel like setting up everybody's environment to build them. Most people don't need the capability, but they need the result.

Why are you asking?

namdnay · on March 3, 2015

I think the question asked was closer to "why aren't you using an artifact repository?"

Nexus is pretty good, but if the language you are using isn't integrated well with gradle/maven you can always just use a shared drive fed by jenkins builds.

TeMPOraL · on March 3, 2015

> Nexus is pretty good, but if the language you are using isn't integrated well with gradle/maven you can always just use a shared drive fed by jenkins builds.

Here's where I start to have problems with contemporary development culture. You mentioned using Nexus, Gradle, Maven and Jenkins where the guy just want to keep some binaries along with the source code they're generated from.

We're complicating things beyond reason nowadays.

jrochkind1 · on March 3, 2015

To bring it back to the OP, this argument is in fact represented in the OP.

OP is arguing that these (having to use Nexus, Gradle, Maven, and Jenkins just to keep some binaries along with the source code they're generated from) are workarounds to limitations in git that ideally would not be there (and don't neccesarily have to be there, and aren't there in all VCS's), and the OP mentions that instead git fans want to claim "No, that's just the way git SHOULD work, you SHOULD need to go use an 'artifact repo' in addition to git to keep a few binaries with your source code" instead.

I tend to agree with the OP.

angersock · on March 3, 2015

Agreed.

That said--and this is without knowing the exact build and tooling environment, so I may well be giving advice inappropriate to the situation at hand!--the second part of that "keep some binaries along with the source files they're generated from" is kind of an antipattern.

If it takes too long to generate them from source, each time erry time, they need to fix that issue--least of all because slow builds mean slow testing, and slow testing means no testing.

That's why I spent two days earlier this year moving a 30-minute build down to a 2.5 minute build.

Xylakant · on March 3, 2015

Yes, your advice is sound for the general case but inapplicable in this specific incarnation of the problem.

angersock · on March 3, 2015

Out of curiosity, what is the gist of your setup? Why is this incarnation such a departure from the general case?

Xylakant · on March 3, 2015

The output of the build is basically a tool in itself. So most people don't need the build process, just the resulting tool. The input changes on pretty much a monthly basis and is not easily versioned. I could set up all dev machines to support the build and everybody could build it themselves from sources, but that would require me to

  * install all the required instruments for the ritual rain dance
  * teach the whole team how to do the ritual rain dance
  * support the people that break an arm or a leg doing the ritual rain dance

So I prefer perform the dance on my machine, collect the tool and point the main dev environment to the right location. It's all scripted, so I kick off the job and grab a coffee. However, I want to keep old instances around so I can track when bugs crept in, so I can't just go overwrite the result, so I need to adjust the pointer every time. If I could just check the tool in with the regular dev setup that would be much easier, but - since we're using git - that would blow up quite quickly and overwhelm my disk space. (And folks would kill me for filling up their disks as well - rightfully so). That's something SVN or another centralized VCS would handle much more gracefully. In a gist, I have a use-case for fairly stupid versioned file storage with a push/pull api. No complicated merging, no branching, nothing. git-annex could do, but is overkill.

There are some better solutions to what we're currently doing, but there's so many yaks to shave and so few razors.

pyre · on March 3, 2015

> That's something SVN or another centralized VCS would handle much more gracefully.

Subversion handles this gracefully because you don't download all of the repository's history to your machine. It's a trade-off that you're talking about here. Most people are ok with losing the ability to check in 500MB files in order to gain the decentralization of having a full copy of the repo (and not needing to query the server just to view history).

gknoy · on March 3, 2015

Thanks for such a good explanation. When I first saw artifacts of things generated by code in our repo, I had a big WTF moment, but it made a lot of sense once someone pointed out that it was Rather Handy for catching bugs in the code that does the generation.

s73v3r · on March 3, 2015

What if, instead of binary build artifacts, they were images?

pyre · on March 3, 2015

> We're complicating things beyond reason nowadays.

There isn't always a way to dumb things down to the level that people would like. It would be so much easier to get to work if I could fly in a straight line between work and home, but I don't complain that the world doesn't accommodate me.

jsight · on March 3, 2015

He also mentioned using a simple shared drive. It would have been difficult to know which of these solutions was the right level of complexity and capability without knowing more context (a point which he seemed to make clear).

cssmoo · on March 3, 2015

Shared drives are mutable which is a big problem with build output.

Xylakant · on March 3, 2015

No, I didnt. I mentioned a shared location. It's not an networked drive, I'm not insane.

cssmoo · on March 3, 2015

Fair point. Comment retracted. My apologies.

Xylakant · on March 3, 2015

Valid question: Because the tooling does not think in terms of artifacts that are versioned in repositories, but rather in terms of "files" that are in a given location. I'm using a shared location, but every build requires modifying another file to point to the now current version. It's all solvable, but the easiest solution would be to just version the result.

I could fix the tooling, but alas, I have other yaks to shave. It's an imperfect world.

pyre · on March 3, 2015

> but every build requires modifying another file to point to the now current version

If you could version the file in git, you would have to check in the new version of the file, so it's not like you're adding a step to have to update (e.g.) a symlink.

Xylakant · on March 4, 2015

But I have a lot of files lying in a shared location that are named build-<datetime> and I can accidentally break revisions in the repo by moving/renaming/deleting any of these. That may be a feature to some people, but that's something I consider a weak spot. It's brittle and prone to breakage and I dislike brittle.

tedunangst · on March 3, 2015

People get pretty religious about what should (or not) be checked in to version control.

Xylakant · on March 3, 2015

I can tell from the downvotes :)

pyre · on March 3, 2015

It's designed with a particular use-case in mind. When people complain that square pegs don't fit into round holes, it makes more sense for them to step back and evaluate what they are trying to accomplish, and the tools they are using the accomplish it.

phkahler · on March 3, 2015

>> Because they're intermediary steps of a process, regenerating them takes an hour and I don't feel like setting up everybody's environment to build them. Most people don't need the capability, but they need the result.

Put your build tools in a repo. It should be easy to set up a new developer with a complete build system by getting it from the a repo. Now use make or anything that can check dependencies so those things are not regenerated unless they need to be. Always check in a freshly generated file along with its source.

Too · on March 3, 2015

Setting up a build system can be a bitch. Not every developer needs to be able to build every obscure module of all your companies tooling. A .net developer shouldn't have to rebuild c++ boost just because he relies on a small native dll.

In some cases the tooling of one developer can even conflict with the one of another, one requires python 2.7 to be in your PATH while the other requires 3.3, etc.

I fully agree that it should be easy to set up but the reality is different, especially when you deal with legacy, rely on third party libraries or build open source projects from source.

icefox · on March 3, 2015

What is annoying is when build artifacts are checked into source control systems along side of the source. There is very little way to exclude them or only check out the binaries, you can only have both. This happens all too often and I don't care if you want to keep using subversion you really should stop doing this.

/src/foo.c /src/foo.o

Gorkys · on March 3, 2015

Not the OP, but in our product we have very large binaries that are the output of the product itself. Each time a change is made to the program, the output needs to be compared to the existing output first automatically and if the output isn't byte-by-byte identical then manually (visually).

So essentially, without large binaries in source control we'd lose testing.

erikb · on March 3, 2015

I think you argue in the same direction I often argue. Git is not for everybody. If you don't want to spend the time to learn git you probably are better of using something else.

Additionally I would like to add something: There are more cases were you need than you think, especially if you haven't learned it. E.g., you start a little prototype to present to your boss. It succeeds, boom, ten years later you have 100 man team working on the same prototype you started 10 years ago. Suddenly you really need forks, branches, some people spend their whole day merging stuff, etc. But hey, you can't switch from SVN because you have chosen it in the beginning and now everybody is using it, all kinds of scripts, tools, and optimized workflows require that you continue to use it.

Some people might think that spending two weeks in the beginning to learn to use a power tool is much better than starting earlier and thereby getting years of pain later on.

semi-extrinsic · on March 3, 2015

I agree: whenever you find yourself saying "I don't have time to learn this advanced tool that will make my life better, I have to ship now!", you should really take a moment and carefully verify that this is a good choice. (Sometimes it is a good choice though.)

In the world of science, I have colleagues who have said this for ten years wrt. LaTeX vs. Word, or even just learning EndNote vs. typing up reference lists by hand. I cringe every time I watch them spend an order of magnitude more time on pure overhead, and always getting stuff wrong, for every single paper they put out.

erikb · on March 4, 2015

That's a really good reason: When you are unable to learn it correctly (maybe simply because it's too complicated). With the simple tool you are still not able to do all the cool stuff, but at least you can get something simple done fast.

lmm · on March 3, 2015

> That's now known as "I can't deploy because github is down".

I had that happen a few months ago. Had to push the repo to bitbucket and change where a config file pointed to. It took almost as long as doing a merge in SVN.

> I have build artifacts that clock in at about 300 - 500 MB and I'd version control them if possible. I can't, because that would fill my disk within a couple of month, so I have to push them to a server and somehow link them.

Those are very unlike source code. You're not going to want or be able to diff them, or blame them, or view their history. A different tool is appropriate.

> I see a developer wedge his git repo with a pull + rebase about once a month. And then somebody needs to walk over and explain.

Yeah. I see that at about the same rate I saw developers lose data back in the svn days. Git can "wedge itself" but never in a way that induces data loss, IME, in stark contrast to its predecessors.

pyre · on March 3, 2015

> view their history

You can view the history of binary blobs in a git repo.

s_q_b · on March 3, 2015

The main reason for the abandonment of centralized version control was as much about Git and Mercurial having the advantages of DVCSes as it was about SVN starting to show its age.

I found SVN literally painful to use, perhaps because it was based around CVS's standards, and CVS was originally a series of shell scripts written in 1990.

pyre · on March 3, 2015

> That's now known as "I can't deploy because github is down".

No. No it's not. "I can't deploy" is different than "I can't work," unless you "move fast and break things" by deploying as soon as you push your code. Even then git allow you to continue to work, even if you haven't deployed your code.

HelloNurse · on March 4, 2015

"I see a developer wedge his git repo with a pull + rebase about once a month. And then somebody needs to walk over and explain."

Accidents happen, but the pattern of making people make highly predictable mistakes and then calling in the experts is the result of not investing on basic training (and possibly not hiring smart enough people, but it's another matter).

By the way, that "wedged" repository in all likelihood has an uncorrupted working copy containing all changes and can be repaired offline: Git is technically so superior to Subversion and TFS that it isn't even fun.

mikeash · on March 3, 2015

How do you get local commits plus a centralized storage without basically building a DVCS?

Xylakant · on March 3, 2015

Local commits do not require the full history, only a known base. So you could in essence store the commits relative To the last pull from the server and push those later. No need for full history.

marcosdumay · on March 3, 2015

A DVCS without complete history is still a DVCS. In fact, what makes a VCS distributed is much more local commits than local history. At least on my opinion.

But well, anyway we decide to call that beast, a VCS with distributed commits and centrilized (fetchable) history, with a simplified evolution graph (created by the point of view of a central server) would bring 99.99% of the benefit of a centralized VCS, 99.99% of the benefit of a DVCS, and almost no problems from any of them.

pyre · on March 3, 2015

git practically already has this with the --depth option to git-clone. The only thing that it's missing is the ability to enforce storing the last single commit.

I'm sure that you could probably build the system that you are describing on top of git, but you would still run into problems with large files and local commits, even if it was just "commits since last pull/push."

marcosdumay · on March 4, 2015

Yes, git almost has the shallow repository done. It's even in the article, and that "almost" means it's almost trouble free, but will catch everybody off-guard once in a while.

The other part about the simplified evolution graph is completely missing. I don't think one can solve that without completely rewriting the protocol.

lmm · on March 3, 2015

You could indeed do that. But now you have two different kinds of commit that can conflict in surprising ways. I very much doubt you could manage this in a way that's simpler than what git does (remote branches and local branches are branches, you can merge one into the other using the normal tooling you use for merging branches).

Xylakant · on March 3, 2015

I'm fairly sure that at least on the ui-side you could improve over git, but the goal is not necessarily lower complexity. A semi-distributed system could have other advantages, for example fine-grained access control (read permissions on folder or file-level), support for partial checkouts, better support for large binaries (only the last revision on the local repo, older revisions on the server) etc.

git currently can't do that because of the way it's designed and built.

erikb · on March 3, 2015

I have to agree here. You only need a startpoint. In essence a commit is a diff on that startpoint. So "cloning" a single commit should be enough to create a new one and push it.

falsedan · on March 3, 2015

git has 'shallow' clones, where it checks out the tree as it was at a particular commit, and has a dummy commit which replaces the history of the repo up until that particular commit.

pron · on March 3, 2015

A semi-distributed DVCS[1], which is what the author actually advocates, allows for local commits and work with no connectivity.

[1]: https://code.facebook.com/posts/218678814984400/scaling-merc...

masklinn · on March 3, 2015

> The author mentions that you have to retain the whole history of a project. For one thing, storage is cheap. Another point worth mentioning is that you can make shallow clones with Git.

And for a third, it opens options which are not available without that. Annotating or bisecting with server round-trips every time is not really an option.

CHY872 · on March 3, 2015

I disagree - I've done bisects on very large codebases, and they take a while (5-10 seconds for a checkout, perhaps). This is all immaterial compared to the time to recompile the code (possibly a minute or two to get everything recompiled (a Java project)).

jsight · on March 3, 2015

I have worked on plenty of projects where both of those would have taken much longer than that with a non-DVCS.

tieTYT · on March 3, 2015

> Nothing precludes you from using the patch model with DVCS (I mean, Linux kernel development uses Git just fine with this)

Technically, no. But how many projects out there would accept an email patch? They'd probably reject it and tell you to issue a Pull Request instead.

I think his greatest argument is comparing the steps of contributing to github vs contributing to an svn repo.

masklinn · on March 3, 2015

> Technically, no. But how many projects out there would accept an email patch?

Mercurial works with email patches. Not only would they accept it, that's the only way to contribute, sending emails to mercurial-devel.

> They'd probably reject it and tell you to issue a Pull Request instead.

Obviously you're supposed to use the project's workflow, but the point is nothing prevents you from setting up a patch model with a DVCS. Quite the opposite in fact, both Git and Mercurial have facilities for automatically formatting and sending patchsets, and for applying trucktons of patches.

Ref: git am, git format-patch, hg export, hg import and hg email

perlgeek · on March 3, 2015

> Technically, no. But how many projects out there would accept an email patch? They'd probably reject it and tell you to issue a Pull Request instead.

That's more a social issue. How many projects accept patches that go against their submission guidelines? Or coding style guidelines?

> I think his greatest argument is comparing the steps of contributing to github vs contributing to an svn repo.

I found that particularly weak.

Let's look through them in more detail.

    1. Get a copy of the source code
    2. Make your change
    3. Generate a patch with diff
    4. Email it to the mailing list
    5. Watch it get ignored

Wrong. You can't generate a diff, unless you first made copy of the original sources, or re-download/unpack it. So there's an essential step missing.

And often enough, you simply don't have access to the svn repo to do step 1.

    1. Fork the repository on GitHub
    2. Clone your fork of the source code
    3. Make sure you’re on the right branch that upstream expects your patch to be based on, because they totally won’t take patches on master if they expect them on dev or vice-versa.
    4. Make a new local branch for your patch
    5. Go ahead and make the patch
    6. Do a commit
    7. Push to a new branch on your GitHub fork
    8. Go to the GitHub UI and create a pull request
    9. Watch it get ignored

1. is github specific. Gitlab and Bitbucket don't require that

3. applies to SVN projects too.

4. is optional (though highly recommended)

But it gets really interesting when you want to do a second, separate patch. Do that svn when you can't commit directly? well, either throw away your first set of changes, or make a complete copy of your whole checkout.

Animats · on March 3, 2015

"Watch it get ignored"

"Submit a patch" is open source's way of telling you to fuck off.

The Github business of creating a whole publicly visible fork just to submit a patch is a bit much. I have some obsolete forks on GitHub which I need to kill off so someone doesn't try to use them.

Xylakant · on March 4, 2015

Even worse is when people are actually using them because they liked one of your PRs that was never accepted and yell at you when you kill it.

mahyarm · on March 3, 2015

You could of added a few steps, like find out what is the right svn branch to commit source code to, because they might only accept patches to the dev branch vs the trunk branch for example.

tankenmate · on March 3, 2015

git has tools to accept email patches, it's just most people don't use it. I'd accept email patches as long as they merge in a sane fashion.

qznc · on March 3, 2015

There are also projects that reject pull requests and require an email patch. Different projects, different work flows.

I'm agree with the author that the Github model of "always fork the repo public" is stupid. Why not simply push to the official project repo and (for unauthorized people) let it show up as a "pull request"?

erikb · on March 3, 2015

I wouldn't be surprised if there are still more repos with >100 maintainers who mostly receive mail patches. Just because you haven't grown up learning them doesn't mean they are not bigger than all you know, right?

lmm · on March 3, 2015

It's a fake comparison. It only looks like fewer steps because he's not counting how many steps it takes to send an email with an attached file. Not to mention joining a mailing list.

marssaxman · on March 3, 2015

The idea of a "pull request" is a github specific thing, isn't it? Email patches are the normal way of contributing changes in the distributed projects I'm familiar with.

erikb · on March 3, 2015

Actually the original idea for a pull-request is an email send from your local (but accessible via internet) repo to the original repo maintainers that asks them to fetch changes from your repo and merge them.

I think the name of the tool is git pull-request.

marssaxman · on March 4, 2015

Ah, it's git-request-pull. Thanks, I'd never heard of it.

wallyhs · on March 3, 2015

From the conclusion at the end of the article:

"We aren’t going to abandon DVCSes. And honestly, at the end of the day, I don’t know if I want you to. I am, after all, still a DVCS apologist, and I still want to use DVCSes, because I happen to like them a ton. But I do think it’s time all of us apologists take a step back, put down the crazy juice, and admit, if only for a moment, that we have made things horrendously more complicated to achieve ends that could have been met in many other ways."

In the next paragraph, the author links to a post that explains how Facebook will soon experience productivity bottlenecks because of their repository size. That post also explains why they don't want to split up their repository and that, "..the idea that the scaling constraints of our source control system should dictate our code structure just doesn't sit well with us."

These are not UX gripes and the problems aren't solved by adding more cheap storage.

Justsignedup · on March 3, 2015

Remember the days when working from home using Clearcase meant 3-minute right clicks?

Peprage farms remembers.

Sorry shelshock.

gcb0 · on March 4, 2015

> UX issues

this is exactly what the author is saying. and he covers your post there much more concisely. all use cases you mentioned are used infinitely less used than the time you want things git make more cumbersome. that's the while point and i think you made your reply after reading only the title of the article.

but the trolling succeed. everyone replied. sigh. even me.

jimktrains2 · on March 3, 2015

> Note: we fell in love with DVCSes because they got branching right, but there’s nothing inherently distributed about a VCS with sane branching and merging.

No, I fell in love with it because it was distributed and I could work without an Internet connection, which aren't prevalent everywhere, and even in my house, in a large city, can be iffy. By work I mean things like blame, bisect, log, &c not just committing.

> Let me tell you something. Of all the time I have ever used DVCSes, over the last twenty years if we count Smalltalk changesets and twelve or so if you don’t, I have wanted to have the full history while offline a grand total of maybe about six times.

Well lucky him for only being able to code when he has a nice connection, not all of us do.

Also, I like the distributed aspect as well. I like not having to give people commit access to my repo for them to have a proper dev env and then they can send me a patch or a PR and we can incorporate their changes. How would they be able to make any commits or anything otherwise without access to my repository?

Xylakant · on March 3, 2015

I call that filter bubble. I like using git for local commits, but 15 out of 20 persons here don't even take their laptop with them and I bet that 17 out of 20 won't touch code on the road. Maybe at home, if they have to. A decent centralized VCS would totally do.

There are a lot of companies that actually would prefer if the code never left the premises and have a use-case for finer grained permissions (some folks can only touch the assets, others can only ever see the dev branch, can't see history,...), things that are by definition not possible in a DVCS.

Storing large assets in git sort of suck and requires ulgy hacks. I'd love to version the toolchain and the VM images for the local development environment, but that's just not feasible with git.

I consider git a perfect match for loosely knit teams that are spread around the world and travel a lot. It's a great tool for OS development, but it's advantages quickly evaporate for teams that sit in a centralized location with a good connection to the server (cable, Gigbit) and only ever work from there.

amyjess · on March 3, 2015

> I consider git a perfect match for loosely knit teams that are spread around the world and travel a lot. It's a great tool for OS development, but it's advantages quickly evaporate for teams that sit in a centralized location with a good connection to the server (cable, Gigbit) and only ever work from there.

Yes, I think what a lot of people forget is that git was designed specifically with the Linux kernel in mind. Linus wrote it to the workflow of his project without much regard for what other projects do. That's fine; there's nothing wrong with that at all. It just means that it's not suitable for every project, and that's a good thing: different types of projects should use tools that are actually designed for them.

It also explains why git has such a strong learning curve: it was written for kernel hackers. The only people who were expected to use it are the kind of people who are used to delving deep into the nitty gritty. It's why I'm kinda disappointed GitHub became the dominant public source code host, because Mercurial is IMO much better at actually being penetrable to new users. I think people who are mostly familiar with SVN would be far, far more at home with Mercurial than git.

silverbax88 · on March 3, 2015

Mercurial beats Git hands down across the board for me. I've worked on so many projects where the initial development was tossed into Git and then the devs spend three days trying to get their codebases synced and each using a different tool which may or may not implement the core commands of Git.

Mercurial? Works everywhere much more simply, even ties into .NET with VisualHG and gives a better version/branch management than TFS. And doesn't mismanage disk space like Git.

Mercurial + BitBucket is the cleanest, fastest way I have right now for adding devs to new projects. I avoid Git because so few people (ESPECIALLY those who have only used Git) understand source control well enough not to make a mess of it.

erikb · on March 3, 2015

> each using a different tool which may or may not implement the core commands of Git

That's really one of the core problems of using git and why it's not for everyone. If you want a tool to do your job then git won't make you happy. Using the core tools and learning how it works _inside_ is the only way to make it work efficiently.

If you want/need a bike (something easy like svn), use a bike. If you want/need to use an air plane (git) you need to learn how to fly and that costs a lot of time. Putting something on the plane to make it look like a bike (a tool that may or may not enable all of git but probably not) won't suffice in either case.

acdha · on March 3, 2015

> I call that filter bubble. I like using git for local commits, but 15 out of 20 persons here don't even take their laptop with them and I bet that 17 out of 20 won't touch code on the road. Maybe at home, if they have to. A decent centralized VCS would totally do.

Maybe your team isn't a bunch of road-warriors but networks still drop packets, servers get overloaded and many people work remotely using imperfect VPNs & ISPs. It's really easy to forget how much time that used to waste but switching to Git meant that we no longer had daily chatter when any of those flared up & people just got on with life.

That said, I'd love to see some focus on tooling which improves the painful parts you mentioned. I'd love to share binary data in Git and it's possible but painful. Similarly, the main selling point for Git on internal projects is the massive performance and usability wins over most of the competitors but there's no reason why that must be the case other than inertia on the part of the other options.

Xylakant · on March 3, 2015

Sure, networks drop packets and sometimes break down. But the number of issues with a solid office network is fairly low. Glass fiber, gigbit to the server. Folks don't work via VPN. I'm not pretending that the "road warrior" and remote worker use case isn't well served by git - but the "office worker tied to a desk" use case still exists. And from what I see is that it's more dominant than we'd expect.

acdha · on March 3, 2015

> But the number of issues with a solid office network is fairly low

The point was simply that “low” is not the same as “does not apply” and that matters when it's something which prevents someone from doing their job. Even when I worked at 100% on-site projects, I used git-svn so I could make local commits and ignore locking mishaps.

Don't get me wrong, however, I'm totally in agreement for having better tools for supporting the local, centralized workflow. The other reason I used git-svn was because merging was much more reliable and I could rebase changes to squash commits before sharing them with others. All three of those features should work well in any serious version control system regardless of whether it's centralized.

sgift · on March 3, 2015

Your machine will stop working sometimes, far more often than your office network stops working(1) and then all your nice little, local commits which haven't been pushed so far are down the gutter. There's always a trade-off, you just have to know it and work accordingly.

(1) If this doesn't apply to your office network get a better network. Now.

lmm · on March 3, 2015

> Your machine will stop working sometimes, far more often than your office network stops working(1)

Bollocks. I've had two machine failures in eight years, versus at least 20 network failures. Sure, maybe the network admins at five different companies I've worked for all just happen to be a bunch of muppets, but I highly doubt it.

sgift · on March 4, 2015

> Sure, maybe the network admins at five different companies I've worked for all just happen to be a bunch of muppets, but I highly doubt it.

That's your privilege. I don't. The only time I remember the network going down was after a complete air conditioning failure in the server room (a highly unlikely event in itself, but not IMPOSSIBLE) which forced a complete shutdown of IT services. And even then people could still work. Sure, not as well as usual but working was possible. The last time a machine failed was .. oh right. Yesterday.

lmm · on March 4, 2015

Your machine, or anyone's machine? Remember to multiply a network failure by the number of people if you're doing that kind of comparison.

Also, get better machines. The last time a machine failed at this all-macbook shop was several months ago.

acdha · on March 3, 2015

This statement reflects a misunderstanding of the problem:

In either case, you will lose unpublished work in the event of a catastrophic local drive failure.

Only in the case of a centralized system, you will also be unable to work unless the entire network path and remote server is available. This will almost never be a question of data loss but it means that you will be unable to perform version control operations until it's resolved.

wtetzner · on March 3, 2015

> Folks don't work via VPN.

Plenty of people work via VPN. Not just people that have remote jobs, but people that just work from home one or two days per week.

TeMPOraL · on March 3, 2015

VPNs are sort of primarily used to work via them.

kedean · on March 3, 2015

Don't forget everyone working for a consulting company.

mahyarm · on March 3, 2015

Entire remote offices work exclusively through VPNs to the main office. It's definitely a major usecase.

jimktrains2 · on March 3, 2015

> I call that filter bubble. I like using git for local commits, but 15 out of 20 persons here don't even take their laptop with them and I bet that 17 out of 20 won't touch code on the road. Maybe at home, if they have to. A decent centralized VCS would totally do.

Why must I be part of a team? Why can't I just be hacking randomly and syncing my history with my personal server as I feel like it.

Xylakant · on March 3, 2015

> Why must I be part of a team? Why can't I just be hacking randomly and syncing my history with my personal server as I feel like it.

Sure, go, do. Just don't pretend that there are no teams and no other people that have different use cases.

Semiapies · on March 3, 2015

Likewise, acknowledge that many other people have use cases where git works very well for them. That might have more to do with its popularity than a mad love for DVCS.

Xylakant · on March 3, 2015

Go three post up, read the last paragraph. I'll quote it for your convenience:

> I consider git a perfect match for loosely knit teams that are spread around the world and travel a lot. It's a great tool for OS development, but it's advantages quickly evaporate for teams that sit in a centralized location with a good connection to the server (cable, Gigbit) and only ever work from there.

Semiapies · on March 3, 2015

Yes, and that argument was silly. There are many use cases besides dispersed teams and road warriors where git's weaknesses never actually come up and its strengths are useful. However, your arguments, like TFA's, rely on an unconvincing and entirely unproven premise that git doesn't actually suit most coders' use of it.

Xylakant · on March 3, 2015

> However, your arguments, like TFA's, rely on an unconvincing and entirely unproven premise that git doesn't actually suit most coders' use of it.

No. The premise is "a system is conceivable that has gits upsides and less of its downsides" and look - facebook is even building it.

Semiapies · on March 4, 2015

That's a premise so inane that it goes right into meaninglessness. Everything is "conceivable", especially "something that works as well as what I'm using in every way, but doesn't have problem X". It does nothing to conceive that, though.

As to what Facebook's building, meh. Anyone can try to build a better-in-every-way-for-every-application VCS, but look to TFA for a list of just some of the failed attempts to produce better version control mousetraps. It's more likely that they'll produce something that will be handy for niche uses than something that will be a clear win over git for everyone else.

jimktrains2 · on March 3, 2015

> Sure, go, do. Just don't pretend that there are no teams and no other people that have different use cases.

Sure, there are. I've worked on them. I've had networking not work, servers fail, diffs take forever because of an overloaded server. Honestly, at this point, I can't conceive of working in a centralized VCS anymore, so unless you make a salient point about what can be done with one that can't be done with a DVCS it's all opinion vs opinion.

gnaffle · on March 3, 2015

> I call that filter bubble. I like using git for local commits, but 15 out of 20 persons here don't even take their laptop with them and I bet that 17 out of 20 won't touch code on the road. Maybe at home, if they have to. A decent centralized VCS would totally do.

It's not only "for local commits", although being able to have local branches without polluting a public namespace is a huge win. It's also about _speed_ when you're doing VCS operations. Linus Torvalds actually made the case really well in his talk: https://www.youtube.com/watch?v=4XpnKHJAok8

> There are a lot of companies that actually would prefer if the code never left the premises and have a use-case for finer grained permissions (some folks can only touch the assets, others can only ever see the dev branch, can't see history,...), things that are by definition not possible in a DVCS.

That's a question that's completely orthogonal to whether or not you use a DVCS. How is a "traditional" VCS going to help you when you can check out the code locally and smuggle it out on a flash drive?

In my company, we use git and there are access restrictions as to who can access and commit to our branches.

> Storing large assets in git sort of suck and requires ulgy hacks. I'd love to version the toolchain and the VM images for the local development environment, but that's just not feasible with git.

..and that's not the use case for git. Linus has been very clear about _what_ git is optimized for, performance wise.

That doesn't mean that DVCSes in general are useless for storing large assets, but that the most popular implementation is. Also, I'm not really sure what traditional VCS you're referring to, that makes it easy to version VM images and remain storage efficient?

icefox · on March 3, 2015

> I have wanted to have the full history while offline a grand total of maybe about six times.

I have probably run git init in a directory at least once a month to have full tracking capability without the expense of setting up a "server". It says something that the author assumes that a DVCS can only exists if it has a central server and that your local copy is nothing but the "offline" copy. I have also run projects where the github repo was just a copy and the version on my box was the official repo. I also run expensive (cpu/time) scripts that walk over the projects history, something the server admin would never let me do. And then we get into the realm of very expensive hooks that run on my desktop and not on the "server". And lastly even in an always connected world if your server is in Australia you can't change the speed of light, hitting it from Germany will always be a long "slow" trip.

Having the full repository at your disposal without being tied to some other authoritative repository provides a lot of flexibility and enables capabilities that are just not possible otherwise. Some features could easily be included in non-dvcs systems (such as local hooks), but I do not know if we would have seen their success without dvcs systems providing the means for exploration.

craigching · on March 3, 2015

Agree with you 100%, any project, no matter how small, gets "git init"ed with me simply to start tracking it in case I later regret that change I just made.

And I love that if I need it on another machine it's just "git clone ssh://..." instead of having to setup a server.

drostie · on March 3, 2015

Well, I mean, the author goes on to make a bunch of salient points like the difficulty of diffs of nested directories (solved in [1] but not widely implemented) and saving all of the binary blobs in every checkout (solved by [2] which has not "won out").

I think the big thing with DVCSes is that you can pretend you have a client-server model with a handful of directories on your local machine. At one point at my present job I replayed a bunch of SVN history through Hg so that I could by-hand divide the work that had been done into a few named branches; this helped me to figure out where things had "gone wrong" in the project. It was really effective to just have a day of SVN update, hg diff, copy files to ../branch_name, commit to hg, rinse, repeat. What I really needed was indeed the "killer feature" that he's saying -- sane branching and merging -- but the fact that it was all easily contained in my filesystem was a nice plus.

[1] http://www.cs.utexas.edu/~ecprice/papers/confluent_swat.pdf

[2] http://darcs.net/Internals/CacheSystem#lazy-repositories-and...

hbex5 · on March 3, 2015

I fell in love with git because it's FAST. And it's fast because it's distributed.

bryanlarsen · on March 3, 2015

Have you ever used Perforce?

konstruktor · on March 3, 2015

Perforce is not fast, it cheats: You tell the server in advance what you are doing, so that the submit is fast. This comes at a tremendous price: Your editor/IDE has to have a perforce plugin to do what should be the VCS' job (tell perforce what is happening in your workspace), and the connection to the server has to be reliable and low latency, lest you want to spend seconds every time you make an edit in a file that has not been checked out already.

In practice, this model is a constant source of frustration, and everything Perforce has done in the last few years seems to be workarounds for this broken architecture.

gumby · on March 3, 2015

Maybe it's just me, but "fast" isn't something that comes to my mind when I think of Perforce.

There are cases when P4 is the only choice (large binaries come to mind, and really really big code bases) but it's the kind of thing you shift to because you have to, not because you want to.

craigching · on March 3, 2015

> Maybe it's just me, but "fast" isn't something that comes to my mind when I think of Perforce.

Agreed. I use git-p4 for interacting with p4 servers here at work. I love that I can create my commits and make them as granular as I want without having to interact with the server until I'm ready to submit my commits.

Using git-p4 means I don't have to 'p4 edit' my files before I edit them (which really sucks when the p4 server isn't available for any reason), I can simply put off any version control workflows until I'm done with my changes (and slice and dice the changes the way I want with interactive rebase).

Thinking of all the little interactions I do with git which a) aren't possible with p4 or if they were b) would involve talking to the server every step of the way makes me cringe. But then out of necessity, p4 developers probably aren't creating such fine-grained commits like I like to do (and indeed isn't even possible without a lot of fore-planning with p4), so they wouldn't notice the speed impact.

konstruktor · on March 3, 2015

I believe you are conflating two aspects: Large binaries, which are, in very specific circumstances, a (or maybe the only) valid reason to use Perforce, and large codebases, which usually aren't.

When looking at an actual example of the latter [1], you will see that they are heavily optimizing against contention on the central database by limiting the size of database operations. If you want to do something that would require a longer database query: Enjoy your client side error message about implementation details you never wanted to know about.

[1] http://research.google.com/pubs/pub39983.html

gumby · on March 4, 2015

> I believe you are conflating two aspects: Large binaries, which are, in very specific circumstances, a (or maybe the only) valid reason to use Perforce, and large codebases, which usually aren't.

Unfortunately I did not mean to. I would agree with you that binaries are the 95% use case for P4. I think most developers typically wouldn't want to check many or any binaries in (maybe the odd icon or other small, slowly-changing asset, in which case git is adequate), but game developers and people with other binary stuff (e.g. circuit designs) will have large, changing binaries.

However really big (GB-scale) repos can be painful in git. This is why google gritted their teeth and used P4 until they outgrew it too. That's what I meant by "really really big" -- something of the scale that most of us will (hopefully) never see.

sergiotapia · on March 3, 2015

I quit my job because we were stuck using Perforce - how's that? :P

craigching · on March 3, 2015

Try the git-p4 contrib. It has its issues, but it's not bad if you're forced to use p4.

Crito · on March 3, 2015

I'll second this. git-p4 kept me sane through several years of working in a very large perforce shop.

wtetzner · on March 3, 2015

I have, and at least at my work, it's terrible. Sometimes it gets to the point where my submit requests time out several tries in a row. Yes, I'm sure it's because the people maintaining the server aren't doing it properly, but that's sort of the point. With git, you don't have to worry about it.

Xylakant · on March 3, 2015

> With git, you don't have to worry about it.

Wait until you see a badly managed git server that serves a central repository. You'll quickly change your mind if pushes start failing randomly.

deathanatos · on March 3, 2015

> Wait until you see a badly managed git server that serves a central repository. You'll quickly change your mind if pushes start failing randomly.

But I can't code a better sysadmin into either git or Perforce. A badly managed Perforce server will have the same issues. (Unless, of course, you have an argument that Perforce under bad management somehow performs better than git under similar conditions.)

With git, however, I can still commit, and I can still push and pull changes from other people using side channels such as email. I can, for the most part, keep working. Is it more difficult? Of course. But in the particular scenario, git outperforms Perforce, in my opinion. (But this is not the primary reason I use git; at work, we use GitHub and git in very much a centralized manner. GitHub has its outages, and they're annoying. But not work-ending.)

Xylakant · on March 3, 2015

The GP's words were: "With git, you don't have to worry about it."

I never said that perforce would perform in any way better, but I'd argue that if your VCS server is mismanaged you'd better change the person managing because a badly managed VCS means trouble all around. Try pushing your changes via email to your CI server. Fully decentralized would be beautiful, but I seriously don't see many teams that use git (or any other system) in that manner. Some parts of the infrastructure fundamentally end up being centralized, as stupid and wasteful as it is.

deathanatos · on March 4, 2015

I was responding to you. :-)

> I'd argue that if your VCS server is mismanaged you'd better change the person managing because a badly managed VCS means trouble all around.

I agree. The first time I read your argument, I interpreted it as a reason to not use git itself, but I think we're on the same page.

> Some parts of the infrastructure fundamentally end up being centralized, as stupid and wasteful as it is.

What else can be done? I don't really want to push changes to my co-workers individually, I want a place to push changes that any co-worker can then pull from — do I not? Toward this goal, certainly I could create n servers, and make pushing redundant over those n servers, have then do consensus to agree on HEAD, etc., but that seems to me to be what I'm paying GitHub to do.

HelloNurse · on March 4, 2015

If Git pushes start failing randomly, it isn't even a failure: no file has been harmed, and a new central "server" can be freely improvised.

lmm · on March 3, 2015

There are good managed git services for companies that don't want to run their own. If my sysadmins are truly incompetent I can use a free private bitbucket repository.

icefox · on March 3, 2015

But at least you can run git log and not have to wait minutes.

konstruktor · on March 3, 2015

Ironically, Perforce doesn't even have proper search in commit history when it's working as designed. No need for the server to break...

Xylakant · on March 3, 2015

True. You are spared some of the pain, but not all of it.

eitland · on March 3, 2015

I have used it for some full moons. I didn't like it. This was before I learned about DVCS, so my only real reference was SVN, but for what I did (medium-size business apps, a few years of history) I can't say it felt faster than SVN.

OTOH Perforce had some serious downsides like

  * immature eclipse plugin (would crash eclipse more 
    than once a week, forcing a full reset every.single.time)
  * no Netbeans plugin (back then)
  * the whole idea of having to decide in advance, -while in
    the office or otherwise connected, which files you 
    needed to modify and therefore "check out" -
  * which brings me to the next problem: by default only one 
    person can work at a file at the same time.

YMMW, I see some people liking Perforce, OTOH I see people using Macs for coding as well. (That last part is tongue-in-cheek, yes, I really wish I liked Macs and I recommend everyone to consider them.)

kyrra · on March 3, 2015

For those wondering, Perforce just announced their own DVCS implementation[0][1].

[0] http://www.perforce.com/blog/150303/introducing-helix

[1] http://www.perforce.com/helix

Crito · on March 3, 2015

Perforce sucks. From a UX perspective, it really is rotten. Furthermore, contrary to popular belief, it scales poorly.

Perforce scales provided you can continue to throw money and hardware at the machine your repo is on. After a while, for large software companies, that is no longer feasible. You have to split off into multiple perforce repos, at which point you abandon the benefits of a monolithic repo.

I have seen a very large software company abandon perforce for git for this reason. You can't push a single git repo as far as you can push a single perforce repo, but you can push a fleet of git repos way farther than you can push a single perforce repo, and a fleet of perforce repos is something you really don't want to deal with.

jimktrains2 · on March 3, 2015

You mean that proprietary code? No, I avoid non F/OSS at all costs.

politician · on March 3, 2015

Wherein it is pointed out that Hacker News is proprietary software.

jcl · on March 3, 2015

Really? https://github.com/wting/hackernews

squeaky-clean · on March 3, 2015

Where is the license file? Just because the code can be viewed does not make it legal to use or that it is "free" or "open source".

vonmoltke · on March 3, 2015

https://github.com/wting/hackernews/blob/master/copyright

jimktrains2 · on March 3, 2015

Hacker News is a website (so even with their code being open-sourced, I cannot verify what exactly they're running anyway), not software I'm running (JS aside).

The firmware on the routers between me and the server are proprietary, and there is nothing I can do about that.

I can choose not to run proprietary software without having to not use the web.

CHY872 · on March 3, 2015

You sure about that? Your system will be full of proprietary firmwares, drivers, etc.

jimktrains2 · on March 3, 2015

I avoid it as much as I possibly can. I don't know what more to tell you. I never said I'm proprietary code-free, but just that I avoid it at all costs.

Hell, I've recently been forced to install proprietary code to work with government generated data. Such a mess.

cssmoo · on March 3, 2015

* until you work on a project which has a 9Gb repo and history...

makmanalp · on March 3, 2015

You can do a shallow clone in any reasonable versioning system: http://stackoverflow.com/questions/6941889/is-git-clone-dept...

CHY872 · on March 3, 2015

but then you lose the ability to git bisect, blame etc, which is the root comment's advantage of doing git.

I've worked on a project where the git repository was man gigabytes - because at some point someone decided to put some binary files in the repository, which periodically got updated - now years on the repository's about 10GB, and you can't really delete the stuff clogging it up without rewriting history from years prior and making the 200 devs life hell.

Importantly, you do need all that history, because there are commits from the same time that are relevant.

samastur · on March 3, 2015

But then git blame or bisect don't really work anymore.

IanCal · on March 3, 2015

I'm not too sure what could be realistically done to implement offline bisect without getting the all the history you care about.

ufo · on March 4, 2015

One approach would be to query the server for these operations. And unlike what happens in Git, where blame is an O(N) operation, a centralized server is free to spend some extra storage to add some caches or indexes to make these searches faster.

IanCal · on March 4, 2015

Querying the server rarely works when you're offline.

Tobani · on March 3, 2015

It may work well enough.