I always find monorepo/polyrepo discussions tiresome, mostly because people take the limitations of git and existing OSS tooling as a given and project those failings onto whichever paradigm they are arguing against.
I'm pretty excited for new OSS source control tools that would hopefully help us move past this discussion. Particularly, Meta's Sapling[0] seems like a pretty exciting step forward, though they've only released a client so far. (MS released its VFS for Git awhile back, but unfortunately now is deprecated.)
It's like telling someone that throwing all their stuff in one huge box is always better than using smaller boxes. Obviously it depends on the situation.
I strongly prefer the simplicity of a monorepo, but I once worked on a project that used three repos, and kept them in sync by having IntelliJ keep the branches in sync. Make a new branch, and you make it in all three branches simultaneously. Switch branch, and you switch in all three. That made it very convenient.
The project I'm currently working on just switched from polyrepo to monorepo. Interestingly, front and back end were in a single repo, but there was another repo with a bunch of definitions and datatypes, and a third with a frontend component library that was meant to be shared with another team, but that never happened. And that just made development really awkward.
I think polyrepo only makes sense if you actually have multiple teams with clearly separated responsibilities. But then each team still effectively works monorepo, don't they?
I'm on the same page with you. A repository is a boundary of responsibility, and they should be (ideally) able to evolve independent from each other.
Trying to develop software in multiple repos by a single team does not makes sense and creates extra load. The reverse is also true, and creates risk of collisions since different teams can touch the same file unintentionally.
Extending from that point, I don't think Git is a bad or insufficient VCS. Like every software it has opinions, mode of operations, expectations from its user and limitations. One needs to understand what it's working with.
People badmouthing tools because they don't work the way they expect to really rubs me the wrong way sometimes. If you can hold a hammer wrong, you can hold a software wrong, too. This is why people say RTFM since forever.
> I think polyrepo only makes sense if you actually have multiple teams with clearly separated responsibilities. But then each team still effectively works monorepo, don't they?
If you have a cross-functional team they might make a repo for the frontend and a repo for the backend, unless steered to do otherwise.
On my personal experience, relying on Intellij syncs and not knowing how git works is how we got several emergency production reverses applied in a matter of days because someone accidentally kept deploying broken changes to production, while thinking they were working only locally.
The monorepo decision has little to do with VCS from my perspective - I can't think of a single case where git was the make-or-break decision point. It's primarily about operations, testing, dependency management, and release processes.
For me it comes down to this: do you want to put in the effort up front to integrate all your dependencies in a systemic way at development time? or do you want small pieces that can evolve independently, effectively deferring system integration concerns to release time?
Monorepo or Manyrepo - either way, someone has to roll up their sleeves and figure out how all the libraries and services fit together. It's just a matter of when and where you do that.
Can't we just generalize the package manager already and push it into VCS? I just want to commit some code and roll it out in the next release. Somewhere in the tree is a top level makefile or something. Stop making this complicated.
When it comes to scale, force versions to be incremented at the same time across the lot. You can even spin up a new deploy set and gracefully handoff load.
My point is. Just let the source code, in whatever language, be distributed as packages in as modular a way as developers want. One way or another you're going to end up with a makefile or shell script that builds the damn thing. If you don't, then someone fucked up and your build is effectively broken. Monorepo or not.
> My point is. Just let the source code, in whatever language, be distributed as packages in as modular a way as developers want. One way or another you're going to end up with a makefile or shell script that builds the damn thing. If you don't, then someone fucked up and your build is effectively broken. Monorepo or not.
The point of a monorepo is that it is not at all modular. You can upgrade shared dependencies all in one go. You can make systems-wide changes with confidence. The monorepo lets you move everything together in lockstep.
Monorepos also allow for incredible sharing potential, but that's less of a selling point.
Yeah, kinda. If you've got a frontend (e.g. phone app) and a backend, you still need to think about your upgrade scenarios. You probably still need some concept of versioning so you can keep track of keeping backend support for whatever apps will still be in the wild for a while.
I say this as a big fan of monorepos - they're great but they don't solve all problems.
For sure. You're not even absolved of deploying internal microservices in the correct order during certain classes of migrations, or even changing fields within a single service. Systems at scale are hard and require discipline.
Either one of monorepo minus the major downsides or polyrepo minus the major downsides effectively makes the discussion of the trade offs moot and the first one to happen will likely “win” with not enough gained by deviating from the norm to switch once people adopt it.
> people take the limitations of git and existing OSS tooling as a given and project those failings onto whichever paradigm they are arguing against.
Very much agree. We have the whole "rebase to a single commit for your PR" vs. "keep a history of what actually happened" argument. One side wants to view concise, comprehensible change histories and be able to bisect them to see the origins of bugs etc, the other wants to use an rcs/vcs for one of the primary tasks an rcs/vcs is supposed to undertake - recording and keeping safe a version history of code as it is developed. To each the other side is wrong to even want that.
There have been source control systems in the past that would cater to both quite happily. Nightmarish, terrible, slow, heavyweight source control systems that involved learning an entire configuration language to use effectively, and which I certainly wouldn't recommend using today! (Rational Clearcase, I'm looking at you). They have existed and conceivably could do so again.
There is a middle ground between "retain the history of every typo anyone ever made" and "squash a month's worth of work into a single commit". Without having to learn a huge amount you can rebase your private branch now and again to squash all those typos and "fix the tests" commits into a single coherent commit describing the step towards the feature you're working on.
What I'd really love in that context is for Github's PR interface to surface individual commits better. I want to be able to step through each commit, reviewing the incremental changes towards a fully working feature, rather than have to review the entire thing as one big blob.
It already does that. Click on the individual commit to only see those changes, then click next to continue. That's the only way I review commits, and it has been available for at least a few years
> Who wants to record typos or patch after patch in your private branch? That has absolutely no value.
A couple of years ago I wrote most of a custom X509 validation stack in java before realising we didn't need one after all, there was a way to do what we needed with the standard stuff, so it wasn't in the final PR. Three months later things changed, I did need one and being able to look it up saved me several days work.
It can have a huge amount of value.
Who cares about revision history being neat and atomic? It's not been of the slightest consequence to me. It's not like there's a realistic maximum number of revisions you can store in your repo.
But to the original point, clearly both of these features have a use-case, and people want them. Other source control systems in the past (which were much worse in many other ways) catered for this. But the current tension only exists because the dominant source control system doesn't really allow you to pick and choose how you see the data.
I'm so glad that the days of ClearCase are over, managing multisite replicated vobs and whatever bullshit viewspec required to make releases work was a nightmare that I wouldn't wish on my worst enemy. <big tech company> also had some horrendous frontend to clearcase that was actually used by the engineers that had strange and wonderful interactions that only the guy that left 5 years ago knew about and left for us to re-discover.
One reason I like cleaning up the history before merging is that anyone can then `git blame` and land on a commit that shows the feature/bugfix as a whole and hopefully with a clear explanation in the commit message. Not a bunch of "Fix typo" kind of commits.
I'm pretty excited for new OSS source control tools that would hopefully help us move past this discussion. Particularly, Meta's Sapling[0] seems like a pretty exciting step forward, though they've only released a client so far. (MS released its VFS for Git awhile back, but unfortunately now is deprecated.)
[0] https://engineering.fb.com/2022/11/15/open-source/sapling-so...