Sapling: Source control that's user-friendly and scalable (2022)

Yasuraka · on Sept 10, 2024

I've been waiting for the server-side of things to be open-sourced ever since this announcement.

In the meantime, I've been enjoying jj/jujutsu (https://github.com/martinvonz/jj), which started as a 20% project and has been out (and developed) in the open ever since.

An introduction by Chris Krycho for those familiar with some form of vcs: https://v5.chriskrycho.com/essays/jj-init/

"Jujutsu brings to the table a few key concepts — none of which are themselves novel, but the combination of which is really nice to use in practice:

Changes are distinct from revisions: an idea borrowed from Mercurial, but quite different from Git’s model. Conflicts are first-class items: an idea borrowed from Pijul and Darcs. The user interface is not only reasonable but actually really good: an idea borrowed from… literally every VCS other than Git."

aseipp · on Sept 10, 2024

The server-side IS open source -- it's basically been in the source tree since day 1, but "inert" due to facebook-only dependencies that made the code unusable, but most of the stuff was actually there if you were willing to dig deep. And actually, they just recently said it became "usable for unsupported experimentation"[1], so it's been on my TODO list to give Mononoke a run and see how it operates.

Unfortunately the builds for Mononoke are not actually uploaded on GitHub due to a bug in the GHA setup[2] that I reported :')

[1] https://github.com/facebook/sapling?tab=readme-ov-file#monon...

[2] https://github.com/facebook/sapling/issues/922

dsissitka · on Sept 10, 2024

How does https://martinvonz.github.io/jj/latest/FAQ/#can-i-prevent-ju... work in practice?

It feels like you'd end up with a bunch of junk in your repo.

steveklabnik · on Sept 10, 2024

In practice, for more "modern" projects, it works fine: they tend to have good hygiene anyway. For Rust projects, for example, you .gitignore the target directory, and you're good for 99% of projects (and cargo new already generates said ignore.)

For older projects that leave temporary files everywhere, or for workflows that may generate intermediate files (like coredumps are often generated in the cwd and not somewhere easily ignored), it can be a pain.

However, in the next release, a feature related to this has landed: you can add a configuration, `snapshot.auto-track="none()"`, to turn off automatically tracking files, and then `jj track` individual paths (with globs, of course), if you prefer that workflow.

aseipp · on Sept 10, 2024

It basically works fine, but there are some annoying edge cases that even I, a jj developer, have run into. Probably the most "common" one (and it's not that common, just moreso than the rest) when you switch branches with different `.gitignore` files, you may stop ignoring some junk from the other branch when you switch. Kind of annoying.

But it's overall pretty marginal in projects with good ignore hygiene, and there are solutions to that problem too (to varying degrees of ease.)

smaudet · on Sept 10, 2024

> Conflicts are first-class items

I have less experience working in large (heavily tracked active, e.g. 1k+ committers) repos, however we try to keep conflicts to a minimum - of course you can't get rid of them completely, normally merge/rebase I have close to zero conflicts on the regular, or those that I have are automatically (non-ai) merged (correctly).

I suppose I find the idea of conflicts as first class entities as somewhat intriguing, do they function as a sort of super-feature? If e.g. you have 10 features all touching the same area of code, then do you get maybe 3 combined conflict(ing) features?

At any rate, I think while it might be a useful metric, there should be an aim to keep conflicts to an absolute minimum....

steveklabnik · on Sept 10, 2024

One thing that "conflicts as first class items" implies is that commits can exist in a conflicted state. This implies that you don't have to deal with conflicts right away. Let me explain.

jj's rebase works just like git's rebase conceptually: take these commits and move them on top of some different commits. But unlike git, `jj rebase` will aways succeed, and succeed immediately. The resulting rebased commits/changes will just be in a conflicted state if a conflict happens. This decouples the "do the rebase" from "fix the conflicts" in time, meaning I can choose if I want to handle them right away or later.

This ends up being very powerful in a few ways: for one, because it will always succeed, and succeed quickly, we can start automatically rebasing. Let's say you have a branch with three changes on it. You go back and edit the first change, because you forgot to do something. jj will then automatically rebase the two children, and you immediately see if you've introduced a conflict. You can then fix it right now, or you can continue working on change #1, until you're ready to address the conflict. Let's say you decide to do that, and the conflict originates in change #2: when you fix that conflict, #3 gets rebased again, and in this hypothetical story, the conflict goes away.

It's also nice if you have a bunch of outstanding work, and you want to rebase it all on top of your latest trunk/main/master branch: run the command to do so, and it'll all get rebased right away, and you can immediately see what has a conflict and what doesn't, and fix them when you feel like it, rather than doing it right now just so that the command completes.

There are secondary advantages, like the fact that this means rebases can happen in-memory, rather than through materializing files. This means it's super fast, and doesn't interrupt what you're doing. But I think those kinds of things are harder to grok than the workflow improvements.

keybored · on Sept 10, 2024

That sounds like (sounds like at least) a breath of fresh air to be honest. Conflicts are very important to deal with and yet are a second-class concept in Git. And if you primarily use rebase you have to rely on git-rerere. And if you screw up a rebase? I guess you just `git rerere forget` one of those stored files? But what if you forget that it was slightly messed up? Well I guess it might just lie around as an opaque “conflict cache” resolution.

At least I can use `git show --remerge diff` on a merge commit that had a conflict. That gives some after-the-fact insight.

aseipp · on Sept 10, 2024

It doesn't work like that. I should probably rewrite that "first class" part in the docs and use a real example (I wrote it originally), but basically it comes down to this:

When a commit is in a conflicted state, the fact it is conflicted is recorded inside the commit object.

Let's say I have a base B which is the main branch, three commits X Y Z that do 3 different things, and my set of changes that aren't committed, called "@".

    B (main) ---> X ---> Y ---> Z ---> @

Now someone pushes to main, so the full graph now actually looks like this with the new main called B'

    B ---> B' (main)
    \
     \---> X ---> Y ---> Z ---> @

Let's say that B' has a change that will cause a conflict in Y. What does that mean? It means that if we were to change the parent of X to B', then the state of Y would be invalid and thus in conflict, because the changes are not compatible with the state of the system.

`jj rebase -d main -s X` will give you a graph just like this, with such a conflict:

    B ---> B' ---> X ---> Y ---> Z ---> @
                          C      C      C

The marker 'C' means "This commit is conflicted." Note that all descendants of a conflicted commit are conflicted too, unless they solve the conflict. (This sentence is phrased very carefully because I'm about to show you a magic trick.)

Okay... So now in my filesystem, I can go see the conflict and I resolve it. Maybe Y renamed a variable that B' got rid in file foo.c, or something.

So now I can solve this conflict. But how do I resolve it? This is too much of a topic to discuss in general but here is the magic trick:

- Solve the conflict in your working copy

- Move the conflict resolution into the first conflicted commit, Y

- The conflict resolution will be propagated to all descendants, just the same way the conflict itself was propagated.

Step 1: solve the conflict in your working copy. Now my history looks like this.

    B ---> B' ---> X ---> Y ---> Z ---> @
                          C      C

Note: @ is no longer conflicted! We solved the conflict there, so it is OK. Now how do we resolve the conflict in Y and Z?

Step 2: `jj squash --from @ --into Y --interactive` will move any changes you select in a diff editor and then move that diff into the other commit.

Now the graph looks like this:

    B ---> B' ---> X ---> Y ---> Z ---> @

I moved the resolution of the conflict into Y. And so the resolution of the conflict is propagated to Z.

Step 3: There is no step 3. You are done.

So the secret is that, Jujutsu tracks conflicts and the relationships between conflicts in the commit graph, just like commits. This is why they are "first class." Git basically doesn't do any of this. A commit with a conflict and a commit without one are indistinguishable in Git, unless you look at the actual diff and see rejected hunk markers. A conflicted hunk in a modified file in Git is no different than any other hunk.

This is already too long but as an addendum what I used to do is describe the conflict support as "git rebase --update-refs, combined with git rerere, on 1000x steroids." Except that's actually a shit way to describe this functionality, because only like 5 people on Planet Earth know about --update-refs or rerere. So you really need to experience it yourself or see a step-by-step, I'm afraid.

smaudet · on Sept 10, 2024

That all makes a lot of sense! I didn't know about git rerere - maybe I'll make a point to try jj out if it has optimized the diff resolution...

It also closely matches how I actually work, but the twist that you get to pick what time to perform the resolution. If you frequently integrate, your changes are considered newer than whatever was downstream, and so you don't have to re-perform any of your changes....buuut if you have a stack of commits that can take a long while. Plus, you might upset your IDE if there are frequent file changes, so I avoid integrating too often...

It's a nice concept. My one question - can you track history across branches? I frequently "save old state" just in case I need something pre-integrate or my IDE is dumb and something's locked/gets wiped/AV freaks out and deletes my repo files... It sounds like you could sort of easily build that sort of thing on top...

eterps · on Sept 10, 2024

Also check out lazyjj: https://github.com/Cretezy/lazyjj

0cf8612b2e1e · on Sept 10, 2024

The readme did not make it clear: does this have a TUI workflow for moving line chunks around?

That is my one reluctance towards jj is that my (bad practices?) frequently have me touch a few different unrelated bits simultaneously (eg editing foo, but fix a typo in bar) which I use stage to iteratively commit. From the jj stuff I have read thus far, my workflow would be a bit clunky to adapt.

aseipp · on Sept 10, 2024

`jj split --interactive` is built in, and will handle that exact case for you. It also comes with a built in TUI so you can select lines right from the terminal immediately.

You can even do `jj split --parallel` which will not only split the changes into two commits, but then make the commits siblings (instead of child/parent), so you can immediately push things if they're easy fixes.

I have a base of B with my commit @ that contains a bunch of changes.

    B --> @

`jj split` will give me this graph, where `X` contains the changes I selected, and `@` contains all the changes I didn't select:

    B --> X --> @

Instead, if I did `jj split --parallel`, I would get this:

    B --> @
     \
      --> X

Let's say `X` is a typofix and @ contains your feature you're not done with yet. OK, just run `jj git push -c X` and jj will create a branch name for you, and push it to the remote for code review. Done. You don't have to switch branches or do anything. You basically can just stop thinking about X since it will probably get merged.

0cf8612b2e1e · on Sept 11, 2024

Great visual and I will definitely have to check it out now.

steveklabnik · on Sept 10, 2024

> frequently have me touch a few different unrelated bits simultaneously (eg editing foo, but fix a typo in bar) which I use stage to iteratively commit. From the jj stuff I have read thus far, my workflow would be a bit clunky to adapt.

I do this too! I find it easier in jj than in git. I wrote out a lengthy thing about the "oh I found a typo" use case here: https://lobste.rs/s/yjqd6d/against_names#c_7yfw7g

0cf8612b2e1e · on Sept 11, 2024

Thanks for the write up, but that’s a bit more care and thought than I usually have. The outlined workflow is more, “I see a second change I need to do how should I prep for it”. Whereas I am more likely to do a bunch of work and try to separate it later. Again, not defending my workflow, just what I end up doing.

The other response with the ‘jj split’ example is exactly the kind of workflow I envisioned where you can supposedly move the chunks between commits.

steveklabnik · on Sept 11, 2024

Ahh yeah, if you want to take things apart after the fact, split is what you want.

No need to be defensive about workflows! The fact that you don’t need to plan ahead is a strength of a tool, in my mind, not a weakness. Nobody would claim that someone is a bad engineer for saving a file when their work is in progress, and the same is true for version control as far as I’m concerned.

riedel · on Sept 10, 2024

Thanks, looks great. Found this comparison: https://martinvonz.github.io/jj/latest/sapling-comparison/

Will try it for sure.

yellowapple · on Sept 10, 2024

Unfortunately the default name for the command (sl) conflicts with a command-line tool that's essential for my workflow: https://github.com/mtoyoda/sl

lordgrenville · on Sept 10, 2024

It also conflicts with the PowerShell built-in Set-Location (though I imagine that's a pretty small intersection of users...)

meindnoch · on Sept 10, 2024

I spent 1 hour tracking down a bug caused by this.

yellowapple · on Sept 10, 2024

Now that sounds like a fun story :)

paxys · on Sept 10, 2024

So...change it?

hinkley · on Sept 10, 2024

Click on the link.

harikb · on Sept 10, 2024

wmf · on Sept 10, 2024

People might be more interested in the public version of Sapling https://sapling-scm.com/ rather than a blog post from 2022.

aseipp · on Sept 10, 2024

To be fair, the website has changed very little since OSS launch (when this blog post first came out) and almost all of the main points of the blog are still the same, and the selling points are the same, too. So, I think it's fine, but definitely the website is the next place to look, yes.

Probably the biggest major changes in Sapling I can personally think of are the introduction of the experimental "Dotgit" mode (.git exists next to .sl) and that the server-side stuff has recently become more usable. But none of it is stable yet, so for users not much has changed (yet).

zamalek · on Sept 10, 2024

My biggest issue with Juijitsu and Sapling (especially) is compatibility with a repo where nobody else uses these tools. Sapling's problem is pretty obvious: I would have to get others to use FBs merge stack tool and get that past security approval.

JJ? I spent a day trying to rebase/merge from trunk into my PR branch and truly fucked it up in a way that I have never managed with plain old Git - Google results were pretty scarce and unhelpful. For Git, I `switch foo; rebase main; push --force-with-lease`, for JJ (apparently) I `rebase -b main -d foo` - great! How do I access the results of the rebase and update my PR branch? How do I force push that branch? It feels like the documentation is in the same place that Git was in during the early days - I assumes you are deeply familiar with the idiomatic workflow. The effort put into the migration guide is minimal[1].

/rant

[1]: https://martinvonz.github.io/jj/latest/git-comparison/

steveklabnik · on Sept 10, 2024

Sorry you had a bad time with jj!

> Google results were pretty scarce and unhelpful.

This is true, the Discord is very active and helpful though: https://discord.gg/bKWAbtpH Discord isn't for everyone, but you can get help quickly. You can also ask questions as GitHub Discussions: https://github.com/martinvonz/jj/discussions

> How do I access the results of the rebase and update my PR branch?

So, one small difference between jj and git is that jj's branches don't move automatically if you add new changes on top of them. In the next jj release, "branches" will be renamed to "bookmarks" to kinda emphasize this difference, even though they're used for git interop. Additionally, there's some discussion about what making branches/bookmarks move like git branches move could mean, with an experimental setting to give that a try.

With all of that said, when you rebased, it shouldn't have changed where the head of your branch was pointed to, so you shouldn't have needed to actually move anything here. This is a side effect of the change/commit distinction: the branch points to a change, so when you rebase your stuff, the commits will change, but the change will not. That means all you needed to do was...

> How do I force push that branch?

`jj git push` will force push all of your changes.

> It feels like the documentation is in the same place that Git was in during the early days - I assumes you are deeply familiar with the idiomatic workflow.

I agree that the documentation isn't in a super awesome place yet, but I think the cause is kind of the opposite: there isn't really one idiomatic workflow, but instead a few common ones, and a hesitation to demand that a specific workflow is idiomatic.

That being said, I have an in-progress tutorial that I have been waiting for two recent changes to land (one being the branch -> bookmark rename, the other being a different rename (obslog -> evolog)), that is intended to land upstream once I'm done with it: https://steveklabnik.github.io/jujutsu-tutorial/

After that, if I can keep finding the energy, I'd love to keep improving the documentation, but I have a lot less FOSS energy than I used to, so we'll see.

zamalek · on Sept 10, 2024

Thanks for taking the time to address my concerns. I am still very much open to trying it again in the future, but just like you I have time constraints and have to find a time box for it.

What I would really like to see an extremely simple walkthrough that covers some really basic scenarios without the ceremony of setting up a new JJ repo, for busy people who want to pilot something new - that covers all of:

* Central repository using branches (single remote) OR GitHub-style forked repository (multiple remotes)

* Merge-based workflow OR rebase-based workflow, both without conflicts

Using github.com/martinvonz/jj as an example repo - show us how to draw the entire owl

Take me from opening a terminal in order to make my changes, to creating a commit with those changes, merging/rebasing the trunk to my branch, to pushing it to my remote branch. Remember: I (and the rest of the 80%) have a job to do, so we want the cliff notes to reach success, and then we can dive deeper into things as we encounter them naturally.

Once you have the trivial workflow out of the way you can progressively disclose other things. For example, for conflict resolution/merging, you could start with an example that's already in a conflicted state and walk through that - i.e. an example that solely focuses on a specific issue. Remember, your new users want to achieve things first - think from their perspective "oh shit, this rebase has resulted in a conflict, what do I do?" or "my team uses gitflow, how do I interact with that?" Tiny little tutorials that cover common scenarios go a lot further than a monolithic detailed tutorial.

steveklabnik · on Sept 10, 2024

Thank you for the input!

> just like you I have time constraints and have to find a time box for it.

100%, even with jj itself, it took me two or three tries hearing about it before I decided to try it, and then twice of giving it a shot before it stuck. Once it stuck, it really stuck, for me at least.

> Once you have the trivial workflow out of the way you can progressively disclose other things.

This is sorta kinda how I have my tutorial set up, but I suspect I go a bit slow for you at the start. I tend to be more of a bottom-up learner myself, and this is a common criticism of the Rust book.

Part of the reason I wanted to wait before upstreaming what I have is so that I have the freedom for more radical changes, so I will give what you're saying some consideration. I 100% agree that workflows are super super important to demonstrate, it was honestly kind of some of the motivation to write this tutorial in the first place.

Minion3665 · on Sept 10, 2024

oh hey- that's your guide? It's awesome, highly recommend to anyone else wanting to give jujutsu a try! ... I read through it when I didn't know what jujutsu was. It not only convinced me to try jujutsu out, but laid the groundwork for me to get started really quickly and easily.

steveklabnik · on Sept 10, 2024

It is, thank you and I'm glad to hear it!

Aerbil313 · on Sept 10, 2024

If you come into jj and try to just use it without going through the tutorial[1] you’ll just gert frustrated and disappointed. Jj is not git and the workflow is very much different, arguably far better than git. I went through the tutorial first and had no issues at all and I actually understood the how it works under the hood as a VCS. With git I’ve never understood how it works and just used clone, rebase, commit, push.

Seems like you had a bad day. Luckily the day I tried it it occurred me to first RTFM.

1: Steve Klabnik’s tutorial is arguably the best one atm.

aseipp · on Sept 10, 2024

> Sapling's problem is pretty obvious: I would have to get others to use FBs merge stack tool and get that past security approval.

You don't have to use Sapling's integration at all, FWIW. I completely avoided PR functionality when I first started using it, and would just do a branch for every commit (which is what it would do for you, anyway.) I think almost every tool imaginable can do a diff between branches?

> for JJ (apparently) I `rebase -b main -d foo`

These arguments are backwards. What you want is to flip the -d and -b switches which mean "destination" and "branch" respectively:

    jj rebase -d main -b whatever-branch-you-want
    jj git push -b whatever-branch-you-want

You don't have to switch to any branches or do anything else to make this work. These two commands will basically work no matter what the state of the tree is.

> It feels like the documentation is in the same place that Git was in during the early days

Yeah. We've talked about this. It's not great but we have so many things going on right now, nobody has totally taken over documentation. Steve Klabnik's Jujutsu tutorial is very popular and we've even considered asking if we could rewrite our documentation using it, but again, so little time.

https://steveklabnik.github.io/jujutsu-tutorial/introduction...

(I'm one of the Jujutsu developers, in case it wasn't clear from the last paragraph.)

jauntywundrkind · on Sept 10, 2024

(2022)

What's new since? Where are we seeing some adoption? Any progress on:

> When used with our Sapling-compatible server and virtual file system (we hope to open-source these in the future),

Is there significant git/sapling adoption at Meta, or is mercurial/eden still the main thing folks use?

It's a neat idea. The smartlog command being a top level entrance point to a subdirectory in a monorepo seems smart as heck, a fine start.

loeg · on Sept 10, 2024

Everyone still uses the 'hg' command and edenfs, but I think those are just UI for sapling? I might be mistaken.

phyrex · on Sept 10, 2024

Sapling is the mercurial fork. In all the codebases I’ve worked on we’ve never used the git integration

OliverGilan · on Sept 10, 2024

Is there any way to even use this today? I’ve been waiting for the server and file system components to be open sourced for a couple years now

_sgianelli · on Sept 10, 2024

I used it for the last year or two at a company where all of our source control was with github. It has bindings to interact with github (not sure about what else) that allowed me to use it locally without anyone else having to change their workflows/install anything. Granted, I was mostly using it for stacked pr management among a few other things and not really fully taking advantage of all of it.

aseipp · on Sept 10, 2024

Yes, it uses Git as the default backend, so it's more or less just a different interface to a Git repository. Everyone today uses it this way.

The server-side components have been open source, but not fully usable due to fb-only code. That's changing and you can in theory build a working OSS server now that works on mysql/s3, but it isn't supported yet.

nathan_compton · on Sept 10, 2024

I won't use any version control system that doesn't let me stage individual changes in files. Staging entire files at once is just wrong.

aseipp · on Sept 10, 2024

You can do that with Sapling, in fact it's both very easy and conceptually more robust than Git because with Sapling you are just manipulating commits. Sapling also includes `sl web` which makes interactive commits, splitting, and rebasing a breeze. In Sapling, commits are easy to manipulate, so there's nearly no need for the staging area, unlike Git where the staging area is easy to manipulate but commits are unreasonably hard. You can stage temporary changes to your hearts desire with a slight mindset shift.

FrancoisBosun · on Sept 10, 2024

I use Sapling as my main driver. I can commit or revert individual hunks just fine.

zzl0 · on Sept 10, 2024

While there is no staging area, the powerful "--interactive" option is used to select specific files or lines you want committed.

Besides that, there are powerful "split" feature to split a commit into smaller commits or move changes between commits.

rPlayer6554 · on Sept 10, 2024

I'm glad they opened this for others to use. When I worked at FB the VScode plugin was so neat and easy to use!

mdaniel · on Sept 10, 2024

discussed at the time: https://news.ycombinator.com/item?id=33612410

codethief · on Sept 10, 2024

A friend of mine works at Meta and recently gave me an intro to Sapling. Since then I've caught myself several times in my day to day, realizing how useful Sapling would be in my work.

mistrial9 · on Sept 10, 2024

isn't Meta like some kind of real-life "borg" setup, where you can participate only to the extent that you are documented and tracked while doing so? Isn't it obvious that American capital sees no internal constraint on their use of people's whole social identity and lives as products for their machines? Why would anyone depend on their Meta core without knowing this?

szundi · on Sept 10, 2024

Sometimes they make mistakes haha

oftenwrong · on Sept 10, 2024

Seems like an approach that is superficially similar in architecture to what Google uses:

Piper <-> Mononoke

CitC <-> EdenFS

Obviously, the immense scale of these companies constrains the possible solutions. I would be interested to know what design decisions are different between them.

dazzawazza · on Sept 10, 2024

Any idea how it deals with binary files and locking of files? I can't find anything in the docs about it.

I'm a gamedev and currently Perforce and Subversion are the best in this space (because of robust Binary and locking support).

whalesalad · on Sept 10, 2024

facebook will do anything but use git

0cf8612b2e1e · on Sept 10, 2024

Just because git won does not mean it is the apex in design. Especially for a monolithic company, what does a decentralized operation buy you?

steveklabnik · on Sept 10, 2024

Just because they don't use git doesn't mean they use centralized VCS.

DaiPlusPlus · on Sept 10, 2024

Why is that?

loeg · on Sept 10, 2024

Big big monorepo and lots of effort invested in making mercurial usable with said monorepo. Off the shelf git wouldn't work. There's also 10+ years of operational experience with the mercurial toolchain and that has some value.

steveklabnik · on Sept 10, 2024

https://graphite.dev/blog/why-facebook-doesnt-use-git

The short of it was, way back in the day, the mercurial folks were more amenable to working with FaceBook than the git folks were.

keybored · on Sept 10, 2024

All of Big Tech companies use hive repos, I mean monorepos, and Git can’t cope out of the box with that (except maybe at Microsoft since they have some people working on that).

jasonpeacock · on Sept 10, 2024

Amazon doesn't use monorepos. But then different tooling had to be created to manage dependencies and performs builds.

You either have a large monorepo with special tooling, or many repos with special tooling. Either way, if you're a large company then you have special tooling...

mhh__ · on Sept 10, 2024

git can cope a lot better than some seem to realize. Obviously it chugs on a truly huge repo but this is genuinely enormous.

Sapling if I'm not mistaken uses the same file structure as git.

aseipp · on Sept 10, 2024

Sapling is a Git client, so it can use Git on-disk structures, if you ask it to. But its native backend, a central server system called Mononoke, is completely different in design and scope, and is in fact designed to handle very large repositories with working sets and data sizes way beyond what Git can handle.

It's all a bit of a weird conversation because if you talk to an OSS programmer, a 5GB repository is "ginormous", and if you talk to some SV tech person working at corporate, they think a 300GiB repository (Windows monorepo) is reasonably large, and if you talk to a gamedev from a one off random studio, they think 2TiB is "pretty average." So you really need to be specific about the workload you're looking at.

hocuspocus · on Sept 10, 2024

> All of Big Tech

Define all? Because that looks more like maybe half to me, at best.

keybored · on Sept 10, 2024

* Half of all Big Tech.

krunck · on Sept 10, 2024

Not Invented Here Syndrome

https://en.wikipedia.org/wiki/Not_invented_here

mhh__ · on Sept 10, 2024

Facebook seem to have a good track record of their "NIH"s being quite good by their own right.

If you're prepared to spend money on it it's not that hard to beat a generic tool for your workflow.

breadwinner · on Sept 10, 2024

So the motivation behind this project primarily is humongous monorepos that git can't handle? Are such huge monorepos a good idea in the first place? The idea behind the monorepo is that you don't have to wait for changes in a dependency to integrate — as soon as the change is checked into the dependency it is available to you. But this also means it is easy for a commit into a dependency to break a lot of downstream projects as it is not possible for the person maintaining the dependency to test all the dependent projects, unless it is acceptable to rely solely on test automation. It would be better for the owners of the dependent projects to decide when to take the new version of the dependency, after thorough testing.

munificent · on Sept 10, 2024

> But this also means it is easy for a commit into a dependency to break a lot of downstream projects as it is not possible for the person maintaining the dependency to test all the dependent projects, unless it is acceptable to rely solely on test automation.

You can invert this sentence to imagine an alternate world:

If you are able to rely completely on test automation, then it becomes possible for a person to commit a change to a dependency without breaking all downstream projects.

Companies with successful monorepos live in that alternate world. Yes, it requires heavy investment in automated testing: you need a culture where every engineer writes tests for all code they write and edit all the time, you need good automated test tooling, and you need to pay for the servers to run the tests all the time.

But if you have that, you get to avoid the hellscape of bit rotting dependencies that no one can touch because too much of the world sits on top of it. I used to work at EA, which at the time had a separate Perforce repo for every game. There was a big push to reuse libraries across multiple games, but game teams had control over when they updated those libraries. I spent six months of my life doing nothing but integrating upstream changes in some libraries into various game repos that were using them. Not entirely coincidentally, I left EA not too long after.

jkaptur · on Sept 10, 2024

The idea is that it is acceptable to rely on test automation, and that teams should apply the "Beyonce Rule" - if you like it then you should've put a test on it.

hinkley · on Sept 10, 2024

If your tests contain 100% of your requirements then you can rewrite the code. If not then you’re fucked.

zeroonetwothree · on Sept 10, 2024

Test coverage at Meta is pretty far off from this

breadwinner · on Sept 10, 2024

And that's true in most companies. So then the question is, how acceptable is it to cause a break in some dependent project by making a change in a dependency. That will depend on the company and the type of product. For a consumer product company like Meta this is probably ok, but for an enterprise company that makes mission-critical software it may be less ok.

amethyst · on Sept 10, 2024

And yet, as someone working on core language infra, we apply exactly that sort of ideal when making changes. If a diff doesn't break any tests, then it's "safe" to land, and if something does indeed break afterwards, then it's the broken team's responsibility to fix forward or otherwise provide proof that it's a big enough problem to roll back. If we end up in SEV review for a change, and there were no broken tests on the diff, then there are going to be some hard questions for the team that didn't write tests.

Ie, tests aren't mandatory, but if you aren't writing tests, it's your responsibility when someone else's change breaks your project.

breadwinner · on Sept 10, 2024

Tests are hard for UI components. Even when the web page has all the expected elements, the appearance may be broken. At least for UI projects, your approach will fail.

oftenwrong · on Sept 10, 2024

These massive companies do employ a significant reliance on test automation. This approach is paired with sophisticated build tools and supporting infrastructure to make it possible to meaningfully test proposed changes, and ensure that they do not break any downstream projects.

edit: This comment I made recently is relevant, since it speaks to the motivations for using a monorepo and trunk-based development within a business: https://news.ycombinator.com/item?id=41293123

xpe · on Sept 10, 2024

> Are such huge monorepos a good idea in the first place?

In just one question, the comment above has done an impressive trifecta; it (1) ventures off-topic; (2) invites a holy war; (3) nerd-snipes. So, let's see how this train wreck unfolds.

mhh__ · on Sept 10, 2024

> Are such huge monorepos a good idea

Yes.

Internal dependencies, automation, buildkite (i.e. let's run other projects CI against our changes), etc are all basically a scam that only makes "sense" in certain open source projects.

MajimasEyepatch · on Sept 10, 2024

Basically all of the big tech companies run on giant monorepos.