The lead dev for that project just got hired by Facebook (actually, almost all of the core hg team is getting hired by FB), and the feature itself is like 85% complete.
Soon we will have a safe way to collaboratively edit history!
It's such a shame git's pretty much won the DVCS wars - hg has so many neat features and a much cleaner user interface - but it seems that's just no longer relevant, given the difference in adoption.
That's quite a strong statement. Mercurial is less popular, but why would it not be relevant? There are great tools for it, great websites, and it's being very actively developed. Plenty of open source projects use it, and there's little downside to choosing it for your own projects.
Mercurial won't take over the world, and it won't even reach the adoption level of Git, but that doesn't strike me as particularly terrible in any tangible way.
Well, I don't think the VCS matters as much as the community does. And even if hg were clearly better (and git still does have some virtues, so it's not all that cut and dried), then running an open-source project that's not easily open to git users may be worse than missing a VCS with some cool features.
And then there's network effects - it's popular, so it gets built into things like ruby's bundler and now even visual studio. Hg may be nice, but that's an uphill battle, especially over something that's perhaps not even all that important (i.e. even using SVN isn't a development death-knell or anything).
Mercurial may let them continue growing in that way for a while, much like switching to Perforce would let them scale a single repo far beyond what either git or mercurial would tolerate, but it seems to me that they are delaying their problem (and likely digging themselves into quite a hole, depending on the unstated nature of their Mercurial modifications...)
Ultimately, being able to push a single repo into the dozens, hundreds, or even thousands of GB doesn't mean it is a good idea.
Why wouldn't it be a good idea, assuming we had tools capable of scaling indefinitely? We currently split large codebases into "repos" that are effectively isolated even though they're logically connected, and it would often be useful to do, say, atomic commits across them. This seems like a limitation of our current tools, not an intrinsically good thing.
The hypothetical infinitely scaling tool would just be inevitably be functionally equivalent to splitting repos. Basically all you would be doing is swapping around terminology. Instead of "here is Foo, our great big git system which stores thousands of repos" you would be saying "here is our Foo repo, it stores thousands of projects". Objections that git cannot be used because a git repo cannot hold as much data as a Perforce repo are more or less based in improper strict mapping of terminology ("we have a single perforce repo, so we must have a single git repo").
In other words, that those infinitely scaling systems already exist, and they are built on top of git (or hypothetically Hg, though I cannot think of any examples).
For one publicly visible example of the sort of thing that I am talking about, look into how Android development works. Android is 'in git', even though it is too large for a 'single' git repo.
(Note that due to the ways that code and resposibility is typically organized, most organizations are probably in a situation where migrating from a single monolith repo to many git repos would be a conceptually straightforward task, provided that they can break from the conceptual notion of having a 'single' repo. Atomic commits across repos are the primary pain-point, but you would be surprised how much that disappears as you grow use to working with many repos. Supporting a strong notion of versioned dependencies between packages goes a long way.)
Thanks for the pointer to Android, I hadn't looked at how that was organized before.
I agree that atomic commits are a red herring. They're nice to have, but by the time you outgrow a single git repo, you also have projects with different release schedules, and once you have that, you have to deal with version skew anyway, and then you don't really need atomic commits.
I disagree that "those infinitely scaling systems already exist," though. Looking at Android, they had to build a nontrivial wrapper around git to make it work, and it's not totally transparent. You have to think about when to use 'repo' and when to use 'git', and where the boundaries between repos are.
There are huge benefits to having everything be in one giant pile of code and being able to import and sometimes modify code from far away parts of the tree with minimal overhead. The key is to let any directory be a "project" that you can refer to, without any arbitrary distinction between top-level project directories and others. This lets you do things like spin out a part of a project as a semi-independent library without moving files around or creating new repos.
There are some downsides too, of course. Google eventually broke from this model slightly by introducing components, which had issues of their own. Perhaps Facebook has done it better.
So what you're saying: if you manual manage versions of your dependencies, your _version_ control system works?
I think that's a shame, and it doesn't work very well either. Mercurial does the same thing incidentally (unless facebook have somehow solved that), so it's no better there.
Splitting repos is a pain; perhaps a necessary one, but hardly ideal. It just introduces a bunch of extra administration, and it reduces the power of your VCS primitives (such as branching and merging).
> I think that's a shame, and it doesn't work very well either. Mercurial does the same thing incidentally (unless facebook have somehow solved that), so it's no better there.
The way I interpreted the reddit post from the Facebook engineer is that they looked into customizing git to scale better for them, but the codebase wasn't to their liking, so they're going to customize mercurial's instead.
> The matter of customizing git came up and people looked at the code and decided it's pretty convoluted when compared to the Mercurial code.
So it's not like mercurial is able to scale in a way that git can't, it's that they plan to make mercurial scale in a way that git can't. (Any over/under on when they give up on that and decide to use Perforce's Git Fusion?)
I remember reading about this. It would really help if Git supported inotify; there is no reason to stat every file on git status/diff/etc. I remember a mailing list thread about this in the past, but I don't think it ever got off the ground. I know there was a lot of discussion on how to support the feature cross-platform.
On a slightly unrelated tangent, the tup build system supports inotify, which I appreciate.
>> "I give a ton of credit to Linus for having created git back in the day."
Back in the day? It seems like git just came out, is just a few years already like ancient times? I don't get why they can't use git? Git is used to maintain the entire Linux kernel and they want some system rarely used?
I wish people could make up their minds on what DVCS we should all be using and stick with it for 20 or 30 years.
> "I don't get why they can't use git? Git is used to maintain the entire Linux kernel and they want some system rarely used?"
They are presumably trying to use a single repo for a very large chunk of code. Large, not in the sense that the Linux kernel is large, but a repo one (maybe two or more) magnitude larger. Multiple gigabytes large, with most of those gigabytes taken up by small objects.
Git does indeed start to fall down with these sorts of repos. The thing is, if you keep on growing like that, other VCSs will as well. Even with perforce you'll hit a wall eventually.
The solution is to restructure your project, breaking it into several different repos. Understandably, large organizations capable of creating large repos like this are typically going to be resistant to large changes like this brought on by what they interpret as limitations of tooling. Doing it requires restructuring code, training, and may require extensive modification to internal systems that work with your codebase.
My perspective is that this 'limitation' of git is really a symptom of an anti-pattern that should be corrected sooner, rather than later, before you find yourself in a really painful situation (read: three years later your repo is now 10-100 times as large, and you are discovering that continuing to scale with Perforce is becoming untenable).
Linus introduced Linux in 1992. Linus introduced git several years ago primarily for Linux. git is used to maintain code that's over 2 decades old. My comment seems to me to stand.
You're right, I should have been more precise: They seem to be switching to Mercurial because they think it's easier to customize Mercurial in order to address scaling issues they're having. (And I'd guess that some of those customization are going to end up in a future Mercurial release.)
Btw, I'm pretty impressed by Facebook's open source efforts.
HgGit is very slow; and it's hard to use as a real git replacement - I suspect, anyhow. I use it from time to time when I need some specific feature (like mercurial's lovely history query language) that's tricky in git, but I've never tried it as a real replacement, so maybe it's workable nevertheless - have you tried using hggit to really collaborate on a repo actively?
I use it as a real Git replacement, every day at the office.
We have small teams and many mostly small repositories, so i am fetching something like <20 changes into each of <5 repositories every day. I have no complaints about performance or correctness. The main pain is interacting with build systems that quite naturally think in terms of Git hashes; rather than
I think of Mercurial being to Git as FreeBSD is to Linux, Greece was to imperial Rome, and Britain is to America. Less populous, less dominant, but providing a source of creativity and a sophisticating influence.
It's mainly a naming confusion with Git branches. HG branches resemble SVN ones - a copy of the working tree performed from the place of branching. HG 'bookmarks' are Git 'branches' - just references to a commit.
Now I'm still not sure I got everything straight in my head:
The HG wiki says "Git, by contrast, has "branches" that are not stored in history, which is useful for working with numerous short-lived feature branches, but makes future auditing impossible."
Git branches are just commits on a separate path in the tree, right? As long as the branch is merged with git merge --no-ff (which can't be set as the default, yes, I know) -- then a git branch becomes a permanent part of the history due to the commit created during the merge. The fast-forward option is just that: optional. It's only useful as a history-rewriting tool, kind of the way git squash is useful.
I guess I'm asking, what does the HG wiki mean, in concrete terms? What is impossible with git?
Also, to be sure I understood the original comment well, I'll try to sum up the difference between HG branches and bookmarks. Is it correct?
HG branches copy-on-write the entire source tree. This means a HG branch cannot be changed if the history before the branch changes. HG bookmarks (and git branches) can change later if the history changes.
Nothing is impossible, it works like you described. In very simple terms, following Git philosophy rewriting history is a feature, according to HG it's a bug (technically, it is somehow possible AFAIK, but not sure if it's used around for anything except critical fix scenarios). It's just that. Hence the term 'branch' in HG refers to something permanent and 'bookmarks' provide the lightweight functionality. At its roots they have a different approach to managing history (back when I used HG more I didn't understand all the noise around rewriting Git history, obsession with keeping it clean and linear, etc.).
http://mercurial.selenic.com/wiki/ChangesetEvolution
The lead dev for that project just got hired by Facebook (actually, almost all of the core hg team is getting hired by FB), and the feature itself is like 85% complete.
Soon we will have a safe way to collaboratively edit history!