Just use a monorepo

scarmig · on Jan 13, 2023

I always find monorepo/polyrepo discussions tiresome, mostly because people take the limitations of git and existing OSS tooling as a given and project those failings onto whichever paradigm they are arguing against.

I'm pretty excited for new OSS source control tools that would hopefully help us move past this discussion. Particularly, Meta's Sapling[0] seems like a pretty exciting step forward, though they've only released a client so far. (MS released its VFS for Git awhile back, but unfortunately now is deprecated.)

[0] https://engineering.fb.com/2022/11/15/open-source/sapling-so...

Kaotique · on Jan 13, 2023

It's like telling someone that throwing all their stuff in one huge box is always better than using smaller boxes. Obviously it depends on the situation.

mejutoco · on Jan 13, 2023

I love this analogy, allow me to extend it to highlight the network request:

the boxes are in different countries and managed by different teams of people. You can take a flight when you want to access their contents.

mcv · on Jan 13, 2023

I strongly prefer the simplicity of a monorepo, but I once worked on a project that used three repos, and kept them in sync by having IntelliJ keep the branches in sync. Make a new branch, and you make it in all three branches simultaneously. Switch branch, and you switch in all three. That made it very convenient.

The project I'm currently working on just switched from polyrepo to monorepo. Interestingly, front and back end were in a single repo, but there was another repo with a bunch of definitions and datatypes, and a third with a frontend component library that was meant to be shared with another team, but that never happened. And that just made development really awkward.

I think polyrepo only makes sense if you actually have multiple teams with clearly separated responsibilities. But then each team still effectively works monorepo, don't they?

bayindirh · on Jan 13, 2023

I'm on the same page with you. A repository is a boundary of responsibility, and they should be (ideally) able to evolve independent from each other.

Trying to develop software in multiple repos by a single team does not makes sense and creates extra load. The reverse is also true, and creates risk of collisions since different teams can touch the same file unintentionally.

Extending from that point, I don't think Git is a bad or insufficient VCS. Like every software it has opinions, mode of operations, expectations from its user and limitations. One needs to understand what it's working with.

People badmouthing tools because they don't work the way they expect to really rubs me the wrong way sometimes. If you can hold a hammer wrong, you can hold a software wrong, too. This is why people say RTFM since forever.

spikder · on Jan 13, 2023

"and kept them in sync by having IntelliJ keep the branches in sync."

that's the sign that it should be one repo.

robertlagrant · on Jan 13, 2023

> I think polyrepo only makes sense if you actually have multiple teams with clearly separated responsibilities. But then each team still effectively works monorepo, don't they?

If you have a cross-functional team they might make a repo for the frontend and a repo for the backend, unless steered to do otherwise.

mikojan · on Jan 13, 2023

> by having IntelliJ keep the branches in sync.

Wait. How do you do that? This is precisely what I need right now.

ElectricalUnion · on Jan 13, 2023

On my personal experience, relying on Intellij syncs and not knowing how git works is how we got several emergency production reverses applied in a matter of days because someone accidentally kept deploying broken changes to production, while thinking they were working only locally.

mcv · on Jan 13, 2023

I'm afraid I forgot, but I'm sure it's out there somewhere. It blew my mind at the time that IntelliJ's git could do something that cli git couldn't.

perrygeo · on Jan 13, 2023

The monorepo decision has little to do with VCS from my perspective - I can't think of a single case where git was the make-or-break decision point. It's primarily about operations, testing, dependency management, and release processes.

For me it comes down to this: do you want to put in the effort up front to integrate all your dependencies in a systemic way at development time? or do you want small pieces that can evolve independently, effectively deferring system integration concerns to release time?

Monorepo or Manyrepo - either way, someone has to roll up their sleeves and figure out how all the libraries and services fit together. It's just a matter of when and where you do that.

chironjit · on Jan 13, 2023

Yeah, most people who use large companies as examples also conveniently forget that these companies had to build tools to manage the system.

nixpulvis · on Jan 13, 2023

Can't we just generalize the package manager already and push it into VCS? I just want to commit some code and roll it out in the next release. Somewhere in the tree is a top level makefile or something. Stop making this complicated.

When it comes to scale, force versions to be incremented at the same time across the lot. You can even spin up a new deploy set and gracefully handoff load.

My point is. Just let the source code, in whatever language, be distributed as packages in as modular a way as developers want. One way or another you're going to end up with a makefile or shell script that builds the damn thing. If you don't, then someone fucked up and your build is effectively broken. Monorepo or not.

echelon · on Jan 13, 2023

> My point is. Just let the source code, in whatever language, be distributed as packages in as modular a way as developers want. One way or another you're going to end up with a makefile or shell script that builds the damn thing. If you don't, then someone fucked up and your build is effectively broken. Monorepo or not.

The point of a monorepo is that it is not at all modular. You can upgrade shared dependencies all in one go. You can make systems-wide changes with confidence. The monorepo lets you move everything together in lockstep.

Monorepos also allow for incredible sharing potential, but that's less of a selling point.

rkangel · on Jan 13, 2023

Yeah, kinda. If you've got a frontend (e.g. phone app) and a backend, you still need to think about your upgrade scenarios. You probably still need some concept of versioning so you can keep track of keeping backend support for whatever apps will still be in the wild for a while.

I say this as a big fan of monorepos - they're great but they don't solve all problems.

echelon · on Jan 13, 2023

For sure. You're not even absolved of deploying internal microservices in the correct order during certain classes of migrations, or even changing fields within a single service. Systems at scale are hard and require discipline.

Monorepos still bring tremendous benefit.

ricardobeat · on Jan 13, 2023

Sapling is specifically built to deal with issues arising from gigantic monorepos. Why and how do you think that will move us past this discussion?

Spivak · on Jan 13, 2023

Either one of monorepo minus the major downsides or polyrepo minus the major downsides effectively makes the discussion of the trade offs moot and the first one to happen will likely “win” with not enough gained by deviating from the norm to switch once people adopt it.

Nursie · on Jan 13, 2023

> people take the limitations of git and existing OSS tooling as a given and project those failings onto whichever paradigm they are arguing against.

Very much agree. We have the whole "rebase to a single commit for your PR" vs. "keep a history of what actually happened" argument. One side wants to view concise, comprehensible change histories and be able to bisect them to see the origins of bugs etc, the other wants to use an rcs/vcs for one of the primary tasks an rcs/vcs is supposed to undertake - recording and keeping safe a version history of code as it is developed. To each the other side is wrong to even want that.

There have been source control systems in the past that would cater to both quite happily. Nightmarish, terrible, slow, heavyweight source control systems that involved learning an entire configuration language to use effectively, and which I certainly wouldn't recommend using today! (Rational Clearcase, I'm looking at you). They have existed and conceivably could do so again.

But the debate is always in the context of git.

mkesper · on Jan 13, 2023

Who wants to record typos or patch after patch in your private branch? That has absolutely no value.

jon-wood · on Jan 13, 2023

There is a middle ground between "retain the history of every typo anyone ever made" and "squash a month's worth of work into a single commit". Without having to learn a huge amount you can rebase your private branch now and again to squash all those typos and "fix the tests" commits into a single coherent commit describing the step towards the feature you're working on.

What I'd really love in that context is for Github's PR interface to surface individual commits better. I want to be able to step through each commit, reviewing the incremental changes towards a fully working feature, rather than have to review the entire thing as one big blob.

derekperkins · on Jan 14, 2023

It already does that. Click on the individual commit to only see those changes, then click next to continue. That's the only way I review commits, and it has been available for at least a few years

Nursie · on Jan 13, 2023

> Who wants to record typos or patch after patch in your private branch? That has absolutely no value.

A couple of years ago I wrote most of a custom X509 validation stack in java before realising we didn't need one after all, there was a way to do what we needed with the standard stuff, so it wasn't in the final PR. Three months later things changed, I did need one and being able to look it up saved me several days work.

It can have a huge amount of value.

Who cares about revision history being neat and atomic? It's not been of the slightest consequence to me. It's not like there's a realistic maximum number of revisions you can store in your repo.

But to the original point, clearly both of these features have a use-case, and people want them. Other source control systems in the past (which were much worse in many other ways) catered for this. But the current tension only exists because the dominant source control system doesn't really allow you to pick and choose how you see the data.

(Unless it does, see other thread)

e63f67dd-065b · on Jan 13, 2023

I'm so glad that the days of ClearCase are over, managing multisite replicated vobs and whatever bullshit viewspec required to make releases work was a nightmare that I wouldn't wish on my worst enemy. <big tech company> also had some horrendous frontend to clearcase that was actually used by the engineers that had strange and wonderful interactions that only the guy that left 5 years ago knew about and left for us to re-discover.

Nullabillity · on Jan 13, 2023

Git _can_ accomodate both! Just use merge commits, and collapse them when you want the simplified "linear list of merged PRs" history.

Nursie · on Jan 13, 2023

Collapse? Are we talking about a particular UI?

I'm firmly in the "record everything" camp myself but if I can point the 'other' team to an easy way to fulfil their needs it'll be a benefit.

Nullabillity · on Jan 13, 2023

Yes, the git CLI! Try `git log --first-parent`. Not sure about which other frontends support it, but they should (if only to shut the squashers up).

nicolaslem · on Jan 13, 2023

One reason I like cleaning up the history before merging is that anyone can then `git blame` and land on a commit that shows the feature/bugfix as a whole and hopefully with a clear explanation in the commit message. Not a bunch of "Fix typo" kind of commits.

Any solution for that?

Nullabillity · on Jan 13, 2023

It looks like git blame supports the same flag, and explicitly calls this out as a use case in the documentation!

lozenge · on Jan 13, 2023

I always used this but how do you add patches to the same output?

mkl · on Jan 13, 2023

Sapling discussion a couple of months ago: https://news.ycombinator.com/item?id=33612410

xixixao · on Jan 13, 2023

https://graphite.dev is also aiming to provide a FB-like workflow including stacks.

chocological · on Jan 17, 2023

Sapling can work with existing git backends too.

i.e. `sl clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux`

no_identd · on Jan 14, 2023

Does Sapling offer any benefits over https://pijul.org ?

Edit: Ah, it's git compatible. Eh. Step in the right direction I suppose, but…

015a · on Jan 12, 2023

IMO the biggest drawback to a monolith, maybe beyond those listed, is losing the 1-1 mapping of changes to CI to releases. If you know "this thing is broken", the commit log is a fantastic resource to figure out what changed which may have broke it. You submit a PR, and CI has to run; getting most CI platforms cleanly configured to, say, "only run the user-service tests if only the user-service changes" isn't straightforward. I understand there are some tools (Bazel?) which can do it, but the vast, vast majority of systems won't near-out-of-the-box, and will require messy pipelines and shell scripts to get rolling.

There's also challenges with local dev tooling. Many VSCode language extensions, for example, won't operate at their best if the "language signaler" file (e.g. package.json for JS) isn't in the root of the repo; from just refusing to function, to intellisensing code in another project into this one, all manner of oddities.

Meanwhile; I don't think the purported advantages are all that interesting. Being able to write the blog post and make the change in the same PR? I've never been in a role where we'd actually want that; we want the change running silently in prod, publish the blog post, flip a feature flag or A/B test proportions. Even an argument like "you can add schemas and the API changes in one go"; you can do that without a "monorepo", just co-locate your schemas with the API itself, this isn't controversial or weird, this is how everywhere I've worked operates.

None of that is to even approach the scale challenges that come with monorepos at even reasonable sizes; which you can hit pretty quickly if you're, say, vendoring dependencies. Trying to debug something while the Github or Bitbucket UI is falling over because the repo is too large, or the file list is too long, isn't fun.

I'm not going to assert this is a hill I would die on, but I'm pretty staunchly in the "one service, for one deployment artifact, for one CI pipeline, for one repo" camp.

xmodem · on Jan 12, 2023

As a big fan of the monorepo approach personally, I would say the biggest benefit is being able to always know exactly what state the system was in when a problem occurred.

I've worked in large polyrepo environments. By the time you get big enough that you have internal libraries that depend on other internal libraries, debugging becomes too much like solving a murder mystery. In particular, on more than one occasion I had a stacktrace that was impossible with the code that should have been running. A properly-configured monorepo basically makes that problem disappear.

This is more of a problem the bigger you are, however.

015a · on Jan 13, 2023

I think we're just reducing down to "programming at scale" is hard, at some point.

Sure; that is a really big problem, and it becomes a bigger problem the bigger you are. But, as you become bigger: the monorepo is constantly changing. Google has an entire team dedicated to the infrastructure of running their monorepo. Answering the question: "For this service, in this monorepo, what is the set of recent changes" actually isn't straightforward without additional tooling. Asking "what PRs of the hundred open PRs am I responsible for reviewing" isn't straightforward without, again, additional tooling. Making the CI fast is hard, without additional tooling. Determining bounded contexts between individuals and teams is hard, without additional tooling.

The biggest reason why I am anti-monorepo is mostly that: advocates will stress "all of this is possible (with additional tooling)", but all of this is possible, today, just by using different repos. And I still haven't heard a convincing argument for what benefits monorepos carry.

Maybe you could argue "you know exactly what state the system was in when something happened", sure. But when you start getting CI pipelines that take 60 minutes to run, or failed deployments, or whathaveyou; even that isn't straightforward.

And I also question the value of that; sure you have a single view of state, but problems don't generally happen "point in time"; they happen, and then continue to happen. So if we start an investigation by saying "this shit started at 16:30 UTC", the question we first want to have answered is "what changed at 16:30 UTC". Having a unified commit log is great, but realistically: a unified CI deploy log is far more valuable, and every CI provider under the sun just Does That. It doesn't mean squat that some change was merged to master at 16:29 if it didn't hit prod until 16:42; the problem started at 16:30; the unified commit log is just adding noise.

bluGill · on Jan 12, 2023

You had the wrong tools. It doesn't matter if you have a monorepo or not, you will need tools to manage your project.

I'm on a mutirepo project and we can't have that problem because we have careful versioning of what goes together. Sure many combinations are legal/possible, but we control/log exactly what is in use.

xmodem · on Jan 13, 2023

> Sure many combinations are legal/possible, but we control/log exactly what is in use.

I'll acknowledge our tooling could have been better, but isn't it better to just be able to check out one revision of one repo and have confidence that you're looking at the code that was running?

threeseed · on Jan 13, 2023

It depends on your architecture.

If I have a services based architecture then I can jump straight to the repo for that particular service and have confidence that it is the code that is running.

Spivak · on Jan 13, 2023

So instead of adopting a system that makes the problem we’re discussing not possible you use a human-backed matrix of known compatible versions?

Like you do you but I’ve never seen “just apply discipline” or “just be careful” ever work. You either make something impossible, with tooling or otherwise, or it will happen.

bluGill · on Jan 13, 2023

No, it is a tool backed matrix. Illegal combinations are not possible, and we have logs of exactly what was installed so we can check that revision out anytime

funcDropShadow · on Jan 13, 2023

To solve this properly you need to store the deployed/executed commit id of any service. That could be in the logs, in the a label/annotation of a kubernetes object or somewhere else. But this has nothing to do whether you use a monorepo or multiple smaller repositories. In some projects of me, we use the commit of the source repo as docker tag. And we make sure that the docker image build is as reproducible as possible. I.e. we don't always build with the latest commit of an internal library, but with the one that is mentioned in the dependency manifest of our build tool. Since updating all those internal dependencies is a hassle, that is updated automatically. It means there is an auto-generated merge requests to update a dependency for every downstream project. Therefore all the downstream pipelines can run their test suites before an update gets merged. Once in a while that fails, then a human has to adapt the downstream project to its latest dependencies. In a monorepo that work has to be done as well. But for all downstream projects at once.

solarkraft · on Jan 13, 2023

Could it be that submodules are underused?

_rm · on Jan 13, 2023

Submodules are hell. I work somewhere with a polyrepo topology, with the inevitable "shared" bits ending up integrated into other repos as submodules. Nothing has been more destructive to productivity and caused so many problems.

A plain old monorepo really is the best.

xeyownt · on Jan 13, 2023

Git submodules are really a PITA.

The fact that git checkout did not update submodules was a major design flaw in my opinion.

glandium · on Jan 13, 2023

It can now, but that's not the default. The defaults for submodule suck, because they match the behavior of old versions of git for backwards compatibility.

Too · on Jan 15, 2023

Yeah. Leaving the UX-issues aside. Don't ever use submodules to manage dependencies inside of each polyrepo, it will eventually accumulate duplicate, conflicting and out of date sub-dependencies. Package managers exist for a reason. The only correct way to use submodules is a root-level integration-repository, being the only repo that is allowed to have submodules.

xedrac · on Jan 13, 2023

The only problem I have with a monorepo, is that sometimes I need to share code between completely different teams. For example, I could have a repo that contains a bunch of protobuf definitions so that every team can consume them in their projects. It would be absurd to shove all of those unrelated projects into one heaping monorepo.

xmcqdpt2 · on Jan 13, 2023

Well that's what a monorepo is! I work on one, it's very large, other teams can consume partial artifacts from it (because we have a release system that releases parts of the repo to different locations) but if they want to change anything, then yeah they have to PR against the giant monorepo. And that's good!

Teams out of the repo have to be careful about which version they pull and when they update etc. However, if you are a team IN the monorepo, you know that (provided you have enough tests) breaking changes to your upstream dependencies will make your tests fail which will block the PR of the upstream person making the changes. This forces upstream teams to engage (either by commits or by discussions) with their clients before completing changes, and it means that downstream teams are not constantly racing to apply upgrades or avoiding upgrades altogether.

I work on shared library code and the monorepo is really crucial to keeping us honest. I may think for example that some API is bad and I want to change it. With the monorepo I can immediately see the impact of a change and then decide whether it's actually needed, based on how many places downstream would break.

xedrac · on Jan 16, 2023

Ok. I've had some time to think about this, and I am warming up to the idea. It would sure simplify a lot of challenging coordination problems. My only real concern is that the repo may grow so large it becomes very slow. Doubly so if someone commits some large binaries.

xmcqdpt2 · on Jan 16, 2023

It does become slow eventually, and yes you need discipline and tooling to block people from dumping everything in it.

You do need a lot of code / developers before you outgrow git, cf the linux kernel.

erik_seaberg · on Jan 13, 2023

I’m a git nerd and even I struggle with the submodule UI, there are probably a lot of people who just can’t deal with it.

solarkraft · on Jan 13, 2023

I am certainly not a heavy user, but for work I've made myself a "workflow" repository which pulls together all the repositories related to one task. This works super well. There sure is a bit of weirdness in managing them, but I found it manageable. But I'll admit that I don't really use the submodules for much more than initial cloning, maybe I'd experience more problems if I did.

blagie · on Jan 14, 2023

Yes, but it's because submodules are a badly architected, badly implemented train wreck.

There are many good and easy solutions to this problem, all of which were not implemented by git.

git is a clean and elegant system overall, with submodules as by far the biggest wart in that architecture. They should be doused with gasoline and burned to the ground.

galangalalgol · on Jan 13, 2023

I like using submodules for internal dependencies I might modify as part of an issue. I like conan or cargo for things I never will. I don't particularly like conan. Perhaps bazel, hunter, meson or vcpkg are all better.

SergeAx · on Jan 13, 2023

> internal libraries that depend on other internal libraries

This is where you start to develop nostalgia for well-structured monolithic apps.

The_Colonel · on Jan 13, 2023

I can check out a git revision and the library dependencies will be handled transparently by the package manager.

No doubt this is possible with service approach, but it means additional layers of complexity added on top.

funcDropShadow · on Jan 13, 2023

This should happen on monorepos as well as per-service repos. So it is not argument for any side of that discussion.

The_Colonel · on Jan 13, 2023

But this is a discussion of dependencies between services. You need more tooling for managing inter-service dependencies as opposed to package dependencies within one monolith.

lmm · on Jan 13, 2023

> I've worked in large polyrepo environments. By the time you get big enough that you have internal libraries that depend on other internal libraries, debugging becomes too much like solving a murder mystery. In particular, on more than one occasion I had a stacktrace that was impossible with the code that should have been running. A properly-configured monorepo basically makes that problem disappear.

On the contrary, a monorepo makes it impossible because you can't ever check out what was actually deployed. If what was running at the time was two different versions of the same internal library in service A and service B, that sucks but if you have separate checkouts for service A and service B then it sucks less than if you're trying to look at two different versions of parts of the same monorepo at the same time.

preseinger · on Jan 13, 2023

There is no source of truth for "what was deployed at time T" except the orchestration system responsible for the deployment environment. There is no relationship between source code revision and deployed artifacts.

lmm · on Jan 13, 2023

Hopefully you have a tag in your VCS for each released/deployed version. (The fact that tags are repository-global is another argument for aligning your repository boundaries with the scope of what you deploy).

preseinger · on Jan 13, 2023

Of a service, yes. Of the entire infrastructure, no.

Spivak · on Jan 13, 2023

Why not? I’m doing it right now. The infrastructure is versioned just like the app and I can say with certainty that we are on app version X and infra version Y.

I even have a nice little db/graph of what versions were in service at what times so I can give you timestamp -> all app and infra versions for the last several years.

preseinger · on Jan 13, 2023

Unless your infrastructure is a single deployable artifact, its "version" is a function of all of the versions of all of the running services. You can define a version that establishes specific versions of each service, but that's an intent, not a fact -- it doesn't mean that's what's actually running.

Spivak · on Jan 13, 2023

Am I missing some nuance here? Yes the infra version is an amalgamation of the fixed versions of all the underlying services. Once the deploy goes green I know exactly what’s running down to the exact commit hashes everywhere. And during the deploy I know that depending on the service it’s either version n-1 or n.

The kinds of failures you’re describing are throw away all assumptions and assume that everything from terraform to the compiler could be broken which is too paranoid to be practically useful and actionable.

If deploy fails I assume that new state is undefined and throw it away, having never switched over to it. If deploy passes then I now have the next known good state.

preseinger · on Jan 14, 2023

Oh, this implies you're deploying your entire infrastructure, from provisioned resources up to application services, with a single Terraform command, and managed by a single state file. That's fine and works up to a certain scale. It's not the context I thought we were working in. Normally multi-service architectures are used in order to allow services to be deployed independently and without this form of central locking.

lmm · on Jan 13, 2023

If what was deployed was foo version x and bar version y, it's a lot easier to debug by checking out tag x in the foo repo and tag y in the bar repo than achieving the same thing in a monorepo.

preseinger · on Jan 13, 2023

Of course, but this is entirely possible with a monorepo.

lmm · on Jan 13, 2023

Possible perhaps, but not easy by any means.

funcDropShadow · on Jan 13, 2023

Then you should build one. E.g. gitlab can create special git references for every deployment it ever made.

xmodem · on Jan 13, 2023

Our artifacts are tagged with their git commit and build time, which then gets emitted with every log event.

czx4f4bd · on Jan 13, 2023

I'm not sure I understand how that scenario would arise with a monorepo. The whole point of a monorepo is that everything changes together, so if you have a shared internal library, every service should be using the same version of that library at all times.

lmm · on Jan 13, 2023

And every service deploys instantly whenever anything changes?

(I actually use that as my rule of thumb for where repository splits should happen: things that are deployed together should go in the same repo, things that deploy on different cycles should go in different repos)

xmodem · on Jan 13, 2023

Not necessarily instantly, but our CD is fast enough that changes are in production 5-10 minutes after hitting master.

But what's more valuable is that our artifacts are tagged with the commit hash that produced them, which is then emitted with every log event, so you can go straight from a log event to a checked-out copy of every relevant bit of code for that service.

Admittedly this doesn't totally guarantee you won't ever have to worry about multiple monorepo revisions when you're debugging an interaction between services, but I haven't found this to come up very much in practise.

Edit: I should also clarify, a change to any internal library in our monorepo will cause all services that consume that library to be redeployed.

hmcamp · on Jan 13, 2023

Which CD are you using @xmodem?

xmodem · on Jan 13, 2023

Buildkite, with our own orchestration layer built on top.

yafbum · on Jan 13, 2023

> things that are deployed together should go in the same repo, things that deploy on different cycles should go in different repos

What do you do with libraries shared between different deployment targets?

lmm · on Jan 13, 2023

> What do you do with libraries shared between different deployment targets?

The short answer is "make an awkward compromise". If it's a library that mostly belongs to A but is used by B then it can live in A (but this means you might sometimes have to release A with changes just for the sake of B); if it's a genuinely shared library that might be changed for the sake of A or B then I generally put it in a third repo of its own, meaning you have a two-step release process. The way to mitigate the pain of that is to make sure the library can be tested on its own without needing A or B; all I can suggest about the case where you have a library that's shared between two independent components A and B but tightly coupled to them both such that it can't really be tested on its own is to try to avoid it.

ted_dunning · on Jan 13, 2023

If you have a library that is tightly coupled to A and B, then A and B are effectively coupled.

Ergo, put all three into a single repo because you pretty much have to deploy all three together.

The test for the tightness of coupling is to ask whether A and B can use different versions of the library. If not, they are tightly coupled.

yafbum · on Jan 13, 2023

That's a great test and I think an argument for monorepo for most companies. Unless you work on products that are hermetically sealed from each other, there's very likely going to be tight dependencies between them. Your various frontends and backends are going to want to share data models for the stuff they're exchanging between them for example. You don't really want multiple versions of this to exist across your deployments, at least not long term

lmm · on Jan 14, 2023

I think it's maybe an argument for a single repo per (two-pizza) team. Beyond that, you really don't want your components to be that tightly coupled together (e.g. you need each team to be able to control their own release cycles independently of each other). Conway's law works both ways.

yafbum · on Jan 14, 2023

? You can totally have independent release cycles between multiple targets within a monorepo.

lmm · on Jan 15, 2023

If they have independent release cycles, they shouldn't be tightly coupled (sharing models etc. beyond a specific, narrowly-scoped, and carefully versioned API layer), and in that case there is little benefit and nontrivial cost to having them be in a monorepo.

gregmac · on Jan 13, 2023

Not GP, but I use versioned packages (npm, nuget, etc) for that. They're published just like they're an open source project, ideally using semantic versioning or matching the version of a parent project (in cases where eg we produce a client library from the same repo as the main service).

tobyjsullivan · on Jan 13, 2023

I've had the exact opposite experience. We have a polyrepo setup with four repos in the main stack (and a comical number of repos across the entire product, but that's a different story). My top pain point - almost painful enough to force a full consolidation - is trying to find the source of a regression.

When semi-regularly discover regressions on production and want to know when it was introduced. Any other project I've worked on, that can be done with a simple git bisect. I can tell you that trying to bisect across four repos is not fun. If everything were in a monorepo, I would be able to run the full stack at any point in time.

Now, if all your APIs are stable, this won't be as bad. But if you're actively developing your project and your APIs are private, I can only assume this pain will be ever present.

015a · on Jan 13, 2023

That's fair criticism.

I think my counterpoint is: Generally, if I'm playing the part of the owner of some system N layers deep in the rats nest of corporate systems; I don't even want to think specifically about what broke. I know the dependencies of my system; if I have a dependency on the Users Service, and it looks like something related to the Users Service broke, my first action is probably to go into their slack channel and say "hey, we're seeing some weird behavior from the Users system; did y'all change something?"

At the end of the day; they're going to know best. Maybe code changed. Maybe someone kubectl edit'ed something manually. Not everything is represented in code.

The_Colonel · on Jan 13, 2023

The problem is that in microservice environments, a lot of complexity and source of bugs are (hidden) in the complex interactions between different components.

I also believe that this mentality of siloing/compartmentalization and habit of throwing things over the fence leads to ineffective organization.

After close to a decade of working in various microservice based organizations, I came to a big-ish monolith project (~100 devs). Analyzing bugs is now fun, being able to just step through the code for the whole business transaction serially in a debugger is an underrated boost. I still need to consult the code owners of a given module sometimes, but the amount of information I'm able to extract on my own is much higher than in microservice deployments. As a result, I'm much more autonomous too.

> Maybe code changed. Maybe someone kubectl edit'ed something manually. Not everything is represented in code.

That's honestly one of the big problems in microservices as well.

darkwater · on Jan 13, 2023

> After close to a decade of working in various microservice based organizations, I came to a big-ish monolith project (~100 devs). Analyzing bugs is now fun, being able to just step through the code for the whole business transaction serially in a debugger is an underrated boost. I still need to consult the code owners of a given module sometimes, but the amount of information I'm able to extract on my own is much higher than in microservice deployments. As a result, I'm much more autonomous too.

Could you expand how do you manage ownership of this monolith? Do you run all the modules in the same fleet of machines or dedicated? Single global DB or dedicated DB per module (where it makes sense, obviously)?

Because where I work we have a big monolith with a similar team size and it's a royal PITA, especially when something explodes or it is going to explode (but we have a single shared DB approach, due to older Rails limitation, and we have older Rails because it is difficult to even staff a dedicated team that take care of tending the lower level or common stuff in the monolith).

The_Colonel · on Jan 13, 2023

> Could you expand how do you manage ownership of this monolith?

We have a few devops teams (code + deployment) and platform teams (platform/framework code), the remaining teams (which form the majority of devs) own various feature slices. The ownership is relatively fluid, and it's common that teams will help out in areas outside of their expertise.

> Do you run all the modules in the same fleet of machines or dedicated?

Not sure if I understand. All modules run in the same JVM process running on ~50 instances. There are some specialized instances for e.g. batch processing, but they are running the same monolith, just configured differently.

> Single global DB or dedicated DB per module (where it makes sense, obviously)?

There is one main schema + several smaller ones for specific modules. Most modules use the main schema, though. Note that "module" here is a very vague term. It's a Java application which doesn't really have support for full modules (neither packages nor Java 9 modules count). "module" is more like a group of functionality.

> and we have older Rails because it is difficult to even staff a dedicated team that take care of tending the lower level or common stuff in the monolith).

This is usually a management problem that they don't pay attention to technical debt and just let it grow out of control to the point where it's very difficult to tackle it.

The critical part of the success of this project is that engineering has (and historically had) a strong say in the direction of the project.

funcDropShadow · on Jan 13, 2023

But aren't micro-service specifically designed to be able to split responsibility of a large system between multiple teams. If everybody debugs and fixes bugs across the whole landscape, than everybody has to be familiar with everything, which means you are loosing the benefits. Occasionally, it might be helpful to debug the whole stack at once. But I wouldn't trust a landscape where that is needed too often. I might be that the chosen abstractions don't fit well.

The_Colonel · on Jan 13, 2023

> But aren't micro-service specifically designed to be able to split responsibility of a large system between multiple teams.

That's the idea, but business transactions usually span multiple services and bugs often aren't scoped to a specific service.

> If everybody debugs and fixes bugs across the whole landscape, than everybody has to be familiar with everything

A lot of things can be picked up along the way while you're debugging, and I'm usually able to identify the problem and sometimes even fix it.

> I might be that the chosen abstractions don't fit well.

Very often the case. Once created, services remain somewhat static, their purpose and responsibility often gets muddy. Mostly because "refactoring" microservice architecture is just very expensive and work intensive. Moving code between modules within a monolith is rather easy (with IDE's support), moving code between services is usually not trivial at all.

tobyjsullivan · on Jan 13, 2023

I think the number here may be important. Probably a Dunbar’s number thing.

I’ve seen monorepos scale up to about ~100 engineers (maybe 150 or 10-20 teams?) in a good eng culture.

After that, actors become unpredictable. I’d bet monolithic repos get out of hand and are better off splintering at that point.

My intuition is that splitting repos with fewer than 100 to 150 active contributors is a mistake.

threeseed · on Jan 13, 2023

But that's just that one scenario you've described.

It's also common that if you have a dozen repos that maybe only one has changed and so when there is a defect it's trivial to determine what caused the regression.

I don't think mono or poly repos are better when it comes to triaging faults. They each have strengths and weaknesses.

osigurdson · on Jan 13, 2023

>> IMO the biggest drawback to a monolith

Mono-repo and monolith are orthogonal concepts.

>> I'm pretty staunchly in the "one service, for one deployment artifact, for one CI pipeline, for one repo" camp

This seems reasonable if nothing is shared. If there are any shared libraries then you are back to binary sharing (package managers, etc.) with this approach.

015a · on Jan 13, 2023

That's my mistake; I meant to say Monorepo there.

timthelion · on Jan 13, 2023

It's easy with github actions to only run a workflow if a certain directory has changed:

    on:
      push:
        paths:
          - kcf/tools/kcf_graph/\*
          - .github/workflows/test-kcf-graph.yml
      workflow_dispatch: {}

https://github.com/gradesta/gradesta/blob/default/.github/wo...

SOLAR_FIELDS · on Jan 13, 2023

This looks trivial now, but when you multiply the number of directories by 8 or so it becomes a very nasty mess very quickly.

I think that the idea of only running what changed makes a lot of sense, I just think that managing that in declarative yml falls apart VERY quickly once you hit an inkling of scale.

hyperhopper · on Jan 13, 2023

And what about when one directory relies on another directory?

(The answer is comprehensive tooling, not primitive hacks)

marcyb5st · on Jan 13, 2023

I just want to comment that you are correct. Bazel allows that and so should any tool that can build dependencies DAGs. Once you have that it's absolutely feasible.

The major issue is that you need to be diligent at bookkeeping your dependencies. Bazel enforces that in the BUILD files and since everything is run in a sandbox you can't easily take shortcuts or you'll get missing dependencies errors.

pancakemouse · on Jan 12, 2023

Regarding your first point, a good alternative to Bazel is https://please.build/ - its build graph can solve exactly this problem in CI.

sanderjd · on Jan 13, 2023

> Meanwhile; I don't think the purported advantages are all that interesting

It's one of these things that's really obvious if you've worked in a well configured monorepo, and very hard to convince people of who haven't.

fbdab103 · on Jan 13, 2023

>Many VSCode language extensions, for example, won't operate at their best if the "language signaler" file (e.g. package.json for JS) isn't in the root of the repo; from just refusing to function

With VSCode, a workaround is to use workspaces. Define a workspace and add each subproject folder as its own entity. VSCode will treat each folder as a project root where the language specific tooling will work as expected.

dogleash · on Jan 12, 2023

Your examples of CI and VSCode are on point, and in the bigger picture it's always about tooling.

The mono/multi repo argument is fundamentally boring to me because it always boils down to whether the shape of the tooling problem is easier to work with on this or that side of the divide.

The answer is always whichever tradeoffs work best for your situation, and the reason at the end of your post is as good of a reason as as any.

tshaddox · on Jan 12, 2023

> If you know "this thing is broken", the commit log is a fantastic resource to figure out what changed which may have broke it. You submit a PR, and CI has to run; getting most CI platforms cleanly configured to, say, "only run the user-service tests if only the user-service changes" isn't straightforward. I understand there are some tools (Bazel?) which can do it, but the vast, vast majority of systems won't near-out-of-the-box, and will require messy pipelines and shell scripts to get rolling.

I'm not very familiar with the recently trending monorepo tools, but don't they generally provide a way to declare the dependencies between subrepos and prevent each subrepo from importing or otherwise depending on anything outside of those declared dependencies? If that's the case, then wouldn't CI be able to use that same dependency graph to determine when it needs to rebuild/redeploy each particular subrepo?

deathanatos · on Jan 12, 2023

Well, there aren't "subrepos" … it's all one giant monorepo.

And … no? No, CI tools don't. There's generally not a tool that has the dependency graph, and it's typically not recorded. (Excepting bazel, which set out to solve this problem; lo and behold it was designed by a company with a monorepo, too.)

Some CI systems I've seen have half-assed attempts at it, such as "only run this CI step if the files given by this glob changed". But a.) it requires listing a transitive list of all globs that would apply to the current step, so it's not a good way to manage things and b.) every time I've seen this mis-feature, "change" is described as "in this commit"; that's incorrect. (I have base commit B, I push changes '1 and '2, for commit graph B - '1 - '2 ; CI detects for a step that the globbed files didn't change in '2, and ignores '1. The branch is green. I merge. The result merge commit 'M changes the union of files, so now the tests run, and the commit — now on HEAD — breaks the build. A subsequent unrelated commit M - '3 doesn't modify the relevant code; CI skips the tests and delivers a green result on a broken codebase. People erroneously think "problem fixed". I have seen this all play out in person, multiple times.)¹

(A "much easier" approach is to simply cache a single build step: you hash your inputs, and compute a cache keys; see if your output is cached. If yes use cache, if no build. Computing the cache key is the tricky part and risks that famous "top n problems in computer science … cache invalidation" quote.)

¹while I know how to compute better git diffs, the difference between the common ancestor, the result once the commit gets merged, etc. are subtle. Most devs are shockingly inexperience with git and don't even get this far into the problem, and CI system's insistence on only running on, e.g., "pushes" doesn't help.

vlovich123 · on Jan 13, 2023

Sure. You can't go halfway on monorepos where you check in all the code into one spot but don't build any tooling to manage that. You need to use something like Bazel/Blaze, Buck, or other tools that meant to own responsibility for managing dependencies between projects.

> Well, there aren't "subrepos" … it's all one giant monorepo.

I think op meant organizationally you still have logical components.

> And … no? No, CI tools don't. There's generally not a tool that has the dependency graph, and it's typically not recorded. (Excepting bazel, which set out to solve this problem; lo and behold it was designed by a company with a monorepo, too.)

Everywhere I've worked that has had a monorepo (Google, Facebook), that's definitely the case. The CI automation would query Buck/Bazel to figure out the set of dependencies impacted by a PR. Of course, some PRs would have outsized impacts (e.g. changing libc) but at the same time, there's probably not much better than that.

Apple was a bit different. While nominally each project had it's own git repository, you uploaded code to a central monorepo that organized everything by release. And it built every project incrementally and relinked/rebuilt whichever projects were impacted. They didn't at the time have a centralized CI system. But also Apple's system evolved from many decades ago and is a sane selection for building an OS from back then. Google's approach I think is generally accepted as a more effective strategy in some ways if you're going down that route. That being said, at Google scale you're shipping so much code there's still challenges. For example, there's so much code being changed at Google's scale, that they have to bundle things together into a single CI pass because there's just insufficient compute capacity available to do everything + avoid serialization of unrelated components. Of course, probabilistically there's a non zero chance that something is broken and they intelligently bisect and figure out what change needs to be omitted from the ship. Very complicated. But I think most people underappreciate that they'll never encounter these kinds of problems. If you just go all in on monorepo + Bazel + bazel-aware CI setup + build artifact caching, you're done and don't have to think about builds or code management very much after that. That's a really big superpower.

tshaddox · on Jan 13, 2023

> I think op meant organizationally you still have logical components.

Yes, my impression was that all of these monorepo tools have a first-class notion of the subrepos/projects/workspaces/whatever that make up the monorepo. If you don’t have that, then I guess I don’t really know what you mean when you say you have a monorepo.

nonethewiser · on Jan 13, 2023

I dont understand. You can have 1 repo with multiple services that can be deployed independently.

Edit: perhaps the difference is that you said "monolith." I guess I'm not sure precisely what you mean by this, but context makes it seem like you're using it synonymously with monorepo. Since that's what this thread is about.

015a · on Jan 13, 2023

Apologies; I mean to say Monorepo; and specifically the idea of having one repo with multiple services deployed independently.

BillinghamJ · on Jan 13, 2023

There's a very simple solution to that, if your systems & processes are reasonably lightweight.

Just build and deploy everything on every merge. Compute is fairly cheap, and if running in parallel, it doesn't have to take long.

You can also take it a step further and have "mono" binaries/container images, where you specify the service to execute as the first argument.

I've been doing this for about 5 years now, having a single output artifact for each language being used. It works great.

If you're careful about your optimisations, you can go from hitting the merge button to having 100+ services deployed on production in about 60 seconds

Arguably it's a bit of an extremist approach, but if you have a situation where technically you're deploying thousands of times a day, you get pretty good at making the process reliable and robust

sn0wf1re · on Jan 12, 2023

> If you know "this thing is broken", the commit log is a fantastic resource to figure out what changed which may have broke it

git supports both diff and log for specific directories, although this may not help you if the issue was with a dependency in another folder that was updated.

deathanatos · on Jan 12, 2023

But the point is that the tooling doesn't help you with it. Those are building blocks that you might build a "does this need to get build/deployed?" (& if not, what is the result of the build) mechanism with, but they are not that mechanism.

eklitzke · on Jan 12, 2023

I agree that the blog point didn't make a very strong argument, but there are some inaccuracies in your comment as well.

With regard to the "only run X tests if X changed" problem, Bazel, Buck, and all the other monorepo build tools do this. I mean sure, if you're using some build system not meant for a monorepo you're going to have a terrible time, but who is really going to spend weeks or months converting to a monorepo and not also switch to Bazel (or something akin to it?) at the same time. In fact, I would say switching to Bazel (or Buck, etc.) for builds is a prerequisite to even starting on the path to switch to a monorepo.

This is just a really useful feature even if you're not in a monorepo. Sometimes you're changing some core header file or whatever and you really do need to run nearly all the tests in your test suite. Sometimes you're just changing some fairly self-contained file and only a few tests need to run. Sometimes you change some docs in the repo and you don't need to run any tests at all. Bazel will just do this automatically for local builds (it knows what tests have transitive dependencies that have changed since the last time those tests were run), and setting it up in CI is a few lines of bash or Python. To set this up in CI you basically just check which files have changed since the last time CI ran (e.g. using git diff), then you use bazel query to find all test targets that have transitive dependencies on those files, then you feed that list of test targets to bazel test. You can set this up per-developer branch for instance, so that if you have a bunch of developers all running tests on the same set of CI machines you get good caching.

With regard to colocating schemas in APIs, yes you can do that but it's really annoying to do with Protobuf/Thrift. First of all protobuf and thrift require that IDLs exist locally so they can do code generation, so if you have protobuf files split into multiple services you need a way to distribute them all which is super annoying. Additionally, in some cases there isn't a clear single owner of a particular IDL struct, for example let's say you have some date or time struct that many protobuf messages want to use in their fields. Which service do you define it in? Ignoring that, it is REALLY USEFUL to be able to modify the code for the producer of a message and the consumer of a message all at once, without having to make multiple commits in multiple repositories. This is especially true when it comes to testing. I have thing A producing a new field X, I want B to use the new field X, and I want to test that B uses it correctly. When everything is in one repo this just works, with multiple repos I need to first add the code to thing A, do a release of A (even if that's just making and pushing a git commit), then update B to consume the new thing and add the test, then if I realize I messed something up I need to go update repo A again to test it, and so on. Obviously this works and tons of people do it, but it sucks. I had to do this at my last job (which wasn't using a monorepo), and it worked but it was cumbersome and I hated it.

With regard to scaling, a lot has changed in git in the last two years to make it possible to run huge git monorepos without any weird hacks. The most notable such feature is sparse indexes, which let you clone a subset of a git repo locally and have it work normally. Here's a GitHub blog post about sparse indexes: https://github.blog/2021-11-10-make-your-monorepo-feel-small... . They also have a monorepo tag which you can use to look at other blog posts about monorepos (and as you'll note, most of these are pretty recent): https://github.blog/tag/monorepo/

The biggest downside of a monorepo in my opinion is that there are a lot of things that Bazel makes way harder than the default package manager for language X. Practically speaking it's probably going to be hard to use Bazel if you don't have a dedicated build team with experts who can spend all the time to figure out how to make Bazel work well in your organization. This is pretty different from just using pip or npm or yarn or whatever, where you can get things just working in a couple of minutes and the work to maintain the build system is probably just collectively a few hours of work a week from people spread throughout the organization who can be on any team. For a small organization I can't see it being worth the effort unless you already have a lot of engineers who have a background in Bazel, for example. But there's definitely a point where the high entry-level cost to Bazel and a monorepo makes sense.

lmm · on Jan 13, 2023

> you need a way to distribute them all which is super annoying

Only if your tools are bad. In the stack I'm used to they're just another artifact that gets published as part of a module build.

> Additionally, in some cases there isn't a clear single owner of a particular IDL struct, for example let's say you have some date or time struct that many protobuf messages want to use in their fields. Which service do you define it in?

A common library, just like any other kind of code.

> Ignoring that, it is REALLY USEFUL to be able to modify the code for the producer of a message and the consumer of a message all at once, without having to make multiple commits in multiple repositories. This is especially true when it comes to testing. I have thing A producing a new field X, I want B to use the new field X, and I want to test that B uses it correctly. When everything is in one repo this just works, with multiple repos I need to first add the code to thing A, do a release of A (even if that's just making and pushing a git commit), then update B to consume the new thing and add the test, then if I realize I messed something up I the new version ofneed to go update repo A again to test it, and so on.

That's actually really important because it forces you to test all the intermediate states. If you can just change everything at once then you will, and you probably won't bother testing everything that actually gets deployed, and so you get into a situation where if you could deploy the new version of A and B at exactly the same time then it would work, but when there's any overlap in the rollout everything breaks horribly.

sanderjd · on Jan 13, 2023

It really sucks to manage shared libraries, across many clients in a few languages. Any update requires updating the version everywhere, and you have to independently debug which version of which library was being used by which application at the time a bug occurred. It works, it isn't impossible, obviously lots of people manage it successfully (polyrepos are more common than monorepos in my experience), but it's a giant pain and it sucks.

lmm · on Jan 13, 2023

> Any update requires updating the version everywhere

Much of the benefit of using an IDL like this is to be (mostly) forward compatible, so you don't have to upgrade everywhere immediately.

> you have to independently debug which version of which library was being used by which application at the time a bug occurred

You have to do that anyway; it's easier if your repository history reflects the reality of which applications were upgraded and deployed at which times. There's nothing worse than having to debug a deployed system that was deployed from parts of a single repo but at several different points in that repo's history.

gorgoiler · on Jan 12, 2023

Splitting your business’s mission across hard repository boundaries implies that… you know what you are doing! If that’s you then congratulations. Also: you’re kidding yourself.

For the rest of us, being virtually unable to rethink the structure of our components because they are ossified as named repositories is a technical and social disaster. Whole teams that should have faded and been reabsorbed elsewhere will live forever because the effort to dismantle and re-absorb their code into other components is astronomical.

The value we bring as engineers is in making sequences of small changes that keep us moving towards our business goal. Boundaries that get in the way are anathema to good engineering. Its exactly as if you were unable to move code between top level directories of your project. Ridiculous.

mansoon · on Jan 13, 2023

But, but, my modularity microservices as my self shibboleth that we are going to google scale next week right after I figure out what our product is!

jsnell · on Jan 13, 2023

> Rolling out API changes concomitantly with downstream changes to the documentation or the OpenAPI spec.

> Introducing feature-level changes and the blog post announcing those changes.

These are horrible reasons to use a monorepo. Commits are not units of deployment. Even if you're pushing every system to prod on every commit, you'd still basically always want to make the changes incrementally, system by system, and with a sensibly sequenced rollout plan rather.

To take one of the examples above, why would you ever have the code implementing a feature and an annoucement blog post in the same commit? The feature might not work correctly. You'd want to be able to test it in a staging environment first, right? Or if you don't have staging, be able to run it in prod behind a feature flag gated to only test users, or as a dark launch, or something to verify that the feature is working before letting real users at it and having it crash your systems, cause data corruption, or some other critical problem that would necessitate a rollback. But none of this pre-testing is possible if the code changes are really being done in the same commit as the public announcment.

And talking of rolling back... When you revert the code changes that are misbehaving, what are you doing with the blog post? Unpublish it? Or do some kind of a dirty partial rollback that just reverts the code and leaves the blog post in place?

The same goes for any kind of cross-project change[0], some of which appear more compelling on the surface than the "code and blog post in one" use case (e.g. refactoring an API by changing the interface and callers at the same time). Monorepos allow for making such changes atomically, but you'd quickly find out that it's a bad idea. There are great reasons to use monorepos, but this is not it.

[0] I wrote more about this a couple of years back. https://www.snellman.net/blog/archive/2021-07-21-monorepo-at...

sa46 · on Jan 13, 2023

> you'd still basically always want to make the changes incrementally, system by system, and with a sensibly sequenced rollout plan rather.

Depends. It's significantly faster to deploy everything at the same time and accept that unlucky requests might end up in a weird state than to safely sequence changes.

In SRE phrasing, I'm choosing to spend our error budget to maximize change velocity by giving up on compatibility during deploys by skipping a multi-stage rollout plan. In return, I can condense a rollout to a single commit and deploy. A 99.9% availability target yields up to 86 seconds per day to pretend that deploys are "atomic".

SergeAx · on Jan 13, 2023

Did you ever had to rollback some unlucky changes? Specifically rolling back, not fixing it frantically by several layers of fixes on top the buggy deploy?

SergeAx · on Jan 13, 2023

> Commits are not units of deployment

Thank you for this statement, I will write it in all caps some very visible place.

LarsDu88 · on Jan 13, 2023

I disagree with this sentiment

xiphias2 · on Jan 12, 2023

''I am here to tell you: if you are running a software business and you aren't at, like, Google-tier scale, just throw it all in a monorepo.''

Google _is_ a monorepo

CobrastanJorji · on Jan 12, 2023

Yes, absolutely. Nonetheless, I think the author may be right, except that by "Google" they mean "large." This is a fundamental misunderstanding of just how large Google and its peers are. I think it's more interesting to consider three sizes.

If you're small, everything will fit nicely in a monorepo.

If you're large, you'll want lots of repos. There aren't really any off the shelf monorepo options that scale super well, so using a bunch of small repos is a great way to deal with the problem. Plus, you probably don't have a full time staff babysitting the source repos, so you want some isolation. If someone in another org is breaking stuff left and right, you don't want the other orgs to be affected.

If you're GIGANTIC, monorepos are a pretty great option again. You'll probably have to build your own and then have a full time group of people maintain it, but that's not a huge problem for you because you're a gigantic tech company. You can set up an elaborate build system that takes advantage of the fact that the entire system is versioned together, which can let you almost completely eliminate version dependency hell. You can customize all of your tools to understand the rules for your new system. It's a huge undertaking, but it pays off because you've got a hundred thousand software engineers.

jeffbee · on Jan 13, 2023

> There aren't really any off the shelf monorepo options that scale super well

How can you say this when Perforce on a single machine took Google to absolutely terrifying scale? There is no chance that your mid-sized software company will even slightly tax the abilities of Perforce.

What I believe you meant was there aren't really any good options to make git tolerable for non-trivial projects, and with that I wholeheartedly agree. And that's why these threads are so tiresome: they always boil down to people talking about what git can and cannot do.

CobrastanJorji · on Jan 13, 2023

Google wrote a whole paper on the fact that, with the help of a beast of a single computer, they were able to get Perforce to work for 10,000 employees averaging about 3 commits per second (20 million commits over 11 years) and a much higher volume of other queries. That white paper pointed out that Google had taken performance to the "edge of Perforce's envelope" and they were only able to do that by treating Perforce's performance limitations as a major concern and striping a fleet of hard drives on that machine with RAID 10.

https://www.perforce.com/sites/default/files/still-all-one-s...

That's not an endorsement for a company as big as Google was then looking for an easy, off the shelf solution. It'd probably be just fine for a company of hundreds, but so would git.

On the other hand, if you play to its strengths, it's probably a great choice. Maybe a team of dozens of content developers checking in large assets for videogames. Perfectly great use case for Perforce.

jeffbee · on Jan 13, 2023

An organization with 10000 software developers is already dangerously large. There is no way to define mid-sized as 10k SWEs.

sanderjd · on Jan 13, 2023

The Linux kernel seems pretty non trivial...

jeffbee · on Jan 13, 2023

Google3 is no joke 1000x larger than Linux.

sanderjd · on Jan 13, 2023

Yes, but the Linux kernel is still "non-trivial". You said git is not tolerable for non-trivial projects. I think you just meant that it isn't tolerable for "incredibly large" repos, which I do think is right.

It's just a boring semantic point that I'm making, that "non-trivial" was a hyperbolic word choice.

groodt · on Jan 13, 2023

What’s the story behind google3? What happened to google and google2?

jacoblambda · on Jan 13, 2023

I'd argue that using git's sparse-checkout functionality and enforcing clean commits (such as via the patch-stack workflow and maintaining a hard line approach against diff-noise) does a lot of heavy lifting for handling git monorepos.

Sparse checkouts, shallow fetches/clones, partial clones, etc allow you to work with an egregiously large repository without needing to ever actually mess with the whole thing. Most existing build tooling can be made to work with these features pretty easily however some tools are easier than others.

Enforcing clean commits avoids the issues with keeping track of individual project histories and past that the existing git tooling largely already supports filtering commits to only expose commits relevant to specific directories/pathspecs.

---

The only time I really see an organisation outgrowing a monorepo is if the org is incapable of or unwilling to maintain strict development and integration policies.

Also worth noting because I don't see it mentioned enough but not everything has to be in the same monorepo. Putting all closely related products and libraries in the same monorepo is kosher but there's little reason for unrelated parts of an org's software to all be in the same monorepo. So what might be 50-200 independent projects/repos could be 3-20 monorepos with occassional dependencies on specific projects in the other monorepos.

BoorishBears · on Jan 12, 2023

It really paints a picture of the authors credentials to be making this declaration.

All signs for the rest of the world point to the opposite conclusion: unless you're Google-scale, you don't have Google level resources. Google has more engineers working on developer experience than most companies will ever have period.

And monorepos work best when workflows are carefully thought out with clever application specific tooling.

-

I think the author is probably working on a 1-10 developer project (and I'm leaning towards 1) and has confused the convenience with having things in reach when the entire system fits in your mind with the general benefits of a monorepo.

I also wonder if they read any of the letters they linked too...

dastbe · on Jan 12, 2023

> All signs for the rest of the world point to the opposite conclusion: unless you're Google-scale, you don't have Google level resources. Google has more engineers working on developer experience than most companies will ever have period.

I feel like this was true ~5 years ago, but these days the tooling around scaling monorepos is safely supporting O(100) developers without a lot of overhead.

> And monorepos work best when workflows are carefully thought out with clever application specific tooling.

I don't see any meaningful distinction with how well thought out workflows need to be between mono and polyrepo.

BoorishBears · on Jan 13, 2023

> I don't see any meaningful distinction with how well thought out workflows need to be between mono and polyrepo.

You completely failed to parse the sentence. The point being made isn't "well thought workflows are only for monorepos", that applies to the "needing clever application specific tooling" part.

Vendoring/Versioning for discrete packages is a heavily invested in problem space for most tech stacks you'll come across. But if you build a monorepo and don't end up with a build system that takes on those responsibilities you end something that doesn't scale to even moderately large interconnected components.

OP has such a tiny project that I'm not convinced they're even dealing with dependencies in a traditional sense. But well before you get to Google scale, you'll run into situations where you just want to change one thing and don't want to change every single downstream dependency which normally would have been isolated via a discrete package that doesn't have to change in lockstep. And then that exact same pain starts to exist for deployments and needs to be worked around.

> I feel like this was true ~5 years ago, but these days the tooling around scaling monorepos is safely supporting O(100) developers without a lot of overhead.

Nothing about the above has changed in the last 5 years, it's kind of the ground truth of monorepos via multiple repos: You're the first person I've ever seen imply monorepos don't offload complexity to tooling, even amongst proponents.

tylerhou · on Jan 13, 2023

> But well before you get to Google scale, ... doesn't have to change in lockstep.

Not updating dependencies is the equivalent of never brushing your teeth. Yes, you can ship code faster in the short term, but version skew will be a huge pain in the future. A little maintenance every day is preferable to ten root canals in a few years.

BoorishBears · on Jan 13, 2023

As you scale a small company its exceedingly rare to not need 10 root canals along the way. Meanwhile it's exceedingly common to need to pivot quickly even if it comes at the cost of near-term engineering rigor.

I feel obliged to point out that I work at a company that uses a monorepo, so this isn't a "never use monorepos" counter-post. Instead my points are borderline tautological:

There's a balancing of near-time sacrifice vs long-term sustainability. But you need good reasons to pick the side of the scale that historically got less resources invested into it and puts an impetus on your engineering team to adjust to the knock on effects of that disparity while still building a fledgling company.

hbrn · on Jan 13, 2023

> Not updating dependencies is the equivalent of never brushing your teeth

That's a strawman: the choice is not between updating and not updating. The choice is between updating on my terms or not.

I recently updated stripe from 2.x.x to 5.x.x in one of the projects. That's several years without updates. Wouldn't it be fun if somebody was forced to update multiple projects every single time stripe ships a new minor version? And if we were to do the true monorepo, at what pace do you think stripe would be updated, if it was their responsibility to update all dependents?

dastbe · on Jan 13, 2023

you're conflating management of external dependencies with internal dependencies. Ideally, Stripe is actually as a Library Vendor here and so these are long-lived major versions with well-defined upgrade paths and surface area. Within your company you don't want every team to have to operate as a Library Vendor, and you also want to take advantage of the command economy you operate in to drive changes across the company rapidly.

Also, Amazon went through this whole thing. They have tons of tooling built up around managing different versions of external and internal dependencies and rolling them out in a distributed fashion. They are doing polyrepo at a scale that is unmatched by anyone else. And you know what they've settled on? Teams getting out of sync with the latest versions of dependencies is a Really Bad Thing, and you get barked at by a ton of systems if your software is stale on the order of days/weeks.

hbrn · on Jan 13, 2023

I'm just using external dependency as an example.

> Within your company you don't want every team to have to operate as a Library Vendor

But you want some teams to operate this way. And the best way to do it is by drawing boundaries at the repo level.

This is similar to monolith-services debate. Once monolith gets big enough there's benefit in breaking it down a bit. Technically nothing prevents you from keeping it modular. Except that humans just really suck at it.

> take advantage of the command economy you operate in to drive changes across the company rapidly

Driving changes across the company is a self-serving middle-manager goal. There's a reason why central planning fails at scale every single time it is attempted.

> Teams getting out of sync with the latest versions of dependencies is a Really Bad Thing

It definitely can be a bad thing. But you know what's even worse? Not having the option to get out of sync. If getting out if sync is a problem, polyrepo offers simple tooling to address it.

tylerhou · on Jan 13, 2023

The assumption that you are making is that polyrepos will spend the vast amount of engineering effort to maintain a stable interface. Paraphrasing Linus: “we never break userspace.”

In practice internal teams don’t have this type of bandwidth. They need to make changes to their implementations to fix bugs, add optimizations, add critical features, and can’t afford backporting patches to the 4 versions floating around the codebase.

Repos work for open source precisely because open source libraries generally don’t have a strong coupling between implementers and users. That’s the exact opposite for internal libraries.

hbrn · on Jan 14, 2023

> In practice internal teams don’t have this type of bandwidth

You don't need bandwidth to maintain backward compatibility in polyrepo. As you said yourself, you need loose coupling.

When you are breaking backward compatibility, the amount of bandwidth required to address it is the same in mono- and polyrepos (with some exceptions benefitting polyrepos).

The big difference though is whose bandwidth are we going to spend. Correct me if I'm wrong, my understanding is that at Google it's the responsibility of dependency to update dependents. E.g. if compiler team is breaking the compiler, they are also responsible for fixing all of the code that it compiles.

So you're not developing your package at your own pace, you are limited by company pace. The more popular a compiler is, the slower it is going to be developed. You're slowing down innovation for the sake of predictability. To some degree you can just throw money at the problem, which is why big companies are the only ones who can afford it.

> can’t afford backporting patches to the 4 versions floating around the codebase

Backporting happens in open-source because you don't control all your user's dependencies. Someone can be locked into a specific version of your package through another dependency, and you have no way of forcing them to upgrade. But if we're talking about internal teams, upgrading is always an option, you don't have to backport (but you still have the option, and in some cases it might make business sense).

> open source libraries generally don’t have a strong coupling between implementers and users. That’s the exact opposite for internal libraries.

I disagree. There's always plenty of opportunities for good boundaries in internal libraries.

Though I'll grant you, if you draw bad boundaries, polyrepo will have the problems you're describing. But that's the difference between those two: monorepo is slow and predictable, polyrepo is fast and risky. You can reduce polyrepo risks by hiring better engineers, you can speed up monorepo (to a certain degree) by hiring more engineers.

When there's competition, slow and predictable always loses. Partially that's why I believe Google can't develop any good products in-house: pretty much all their popular products (other than search) are acquisitions.

dastbe · on Jan 13, 2023

> Vendoring/Versioning for discrete packages is a heavily invested in problem space for most tech stacks you'll come across.

100% disagree. The problem of "How do I define my dependencies and have a package manager reify that into a concrete set of versioned dependencies" may be a solved problem, but the tools for tracking dependencies across many repos and driving upgrades is neolithic. About the only company I've seen that does this well is Amazon, and they have yet to sell us version sets as a service.

> OP has such a tiny project that I'm not convinced they're even dealing with dependencies in a traditional sense. But well before you get to Google scale, you'll run into situations where you just want to change one thing and don't want to change every single downstream dependency which normally would have been isolated via a discrete package that doesn't have to change in lockstep. And then that exact same pain starts to exist for deployments and needs to be worked around.

As I alluded to above, the dual of this is that getting everyone to update their dependencies is orders of magnitude more difficult when you have a polyrepo setup, even if we're working under the ideal situation where repo setup is standardize to such a degree that a person can parachute into a repo and become effective within minutes.

> Nothing about the above has changed in the last 5 years, it's kind of the ground truth of monorepos via multiple repos: You're the first person I've ever seen imply monorepos don't offload complexity to tooling, even amongst proponents.

Both polyrepo and monorepo have complexity in scaling that is handled by their tooling. The difference historically is that OSS polyrepo tooling has been better and better integrated because that's just how most things are built in any language, but that has been improving over the past ~half decade

* Bazel maintenance complexity has dropped precipitously, and many of the initial bottlenecks you hit with it have been solved in OSS.

* If you're anti-bazel, gradle and cargo monorepo support is decently good. I believe the same is true in js these days, but I don't have hands-on experience

* Services for managing monorepos like sourcegraph for codesearch or mergify for submit queue now exist that make it easy to adopt the patterns that work well at large companies.

* Microsoft and others have invested in git to improve scalability of developing against large repos.

* you have OSS tools like git-branchless that further improve the experience of working in a monorepo

There are a bunch of companies in the O(100) - O(1000) developer range that are using this stuff and it works very well.

thundergolfer · on Jan 13, 2023

It's awkwardly phrased, yes, but what the author is saying is that Google-tier scale companies will have an awful time migrating from poly to mono-repo, and a not fun time being mono-repo for the first few years.

For company's not at this tier, it isn't that hard to migrate to a monorepo and the benefits will be more immediate because the tools (eg. git) won't be screaming under the load.

(My personal 2c is that you can be well below Google-scale and still hit the limits of the common tooling when using monorepos. Canva, Stripe, and Twitter are examples)

Yahivin · on Jan 12, 2023

Monorepos are great! They're so good one place I worked at had dozens of them.

__alias · on Jan 13, 2023

Underrated comment

yodsanklai · on Jan 12, 2023

My understanding is that Meta/Google had to rewrite tons of stuff (distributed file systems, CVS, search tools, build tools...) and need many teams to maintain all these systems, and sometime their developer experience is inferior to what you get with off-the-shelf OSS. I'm not super familiar with this topic as I don't work with systems of that scale, but I'm wondering was it worth it or necessary? what would be the alternative?

Shish2k · on Jan 12, 2023

> sometime their developer experience is inferior to what you get with off-the-shelf OSS

Working at Meta, being able to use Sapling (our Mercurial fork) is actually one of the highlights and I was pretty miserable every time I needed to go back to Git for my open-source work — thankfully it was open-sourced with a git backend a couple of weeks ago, so now I get to combine the excellent sapling CLI with the github hosting service :D

hahaxdxd123 · on Jan 12, 2023

> sometime their developer experience is inferior to what you get with off-the-shelf OSS.

The internal version control system that I use (Fig, which is apparently based off Mercurial) is so good I've basically never had to think while using it. I can't really say the same about git.

CodeSearch is also, AFAIK, a lot better than what you would get OSS.

I think the productivity gains across 100K engineers is definitely worth it.

skuzye · on Jan 13, 2023

I have to say, I really enjoy using Google monorepo and all tools available to me.

Having said that, it's not free of course. There are many engineering teams and computer power being invested to provide this environment to all googlers.

_skel · on Jan 13, 2023

I've seen what happens when an org tried to adopt a Google-style monorepo without making the investments in build tooling and cultural change. It was a disaster.

All of those things are needed even at orgs much smaller than Google, or you will end up with an unbuildable, unmaintainable, unreleasable mess.

For orgs that can't make those investments, I think a repo per team is the best approach. Each team can treat their repo like their own little Google-style monorepo if they want to.

tylerhou · on Jan 13, 2023

A lot of the work is only because of the large size of the codebase. E.g. Forge and ObjFS, which is necessary because compiling the really large binaries on a normal workstation would OOM it. Or take days. https://bazel.build/basics/distributed-builds

If your codebase is "normal-sized," you don't need nearly that amount of infrastructure. There is probably some growing pain when transitioning from normal-sized to "huge," but that's part of the growing pain for any startup. You're going to have to hire people to work on internal tooling anyway; setting up a distributed build and testing service (especially now there are so many open-source and hosted implementations) is worth the effort once you're starting to scale. You're going to have to set that up regardless of a mono-repo or many separate repos.

It's probably only worth hiring serious, dedicated teams that work on building like Google once your CI costs are a significant portion of operation. That probably won't happen for a while for most startups.

https://github.com/bazelbuild/bazel-buildfarm

https://cloud.google.com/build

https://aws.amazon.com/codebuild/

https://azure.microsoft.com/en-us/products/devops/pipelines/

rtpg · on Jan 13, 2023

I think that's a bit misleading (disclaimer: I very much like Bazel existing, though I think a better version of it could exist somewhere).

Surely a lot of work is put into Bazel core to support huge workflows. But a huge amount of work is put into simply getting tools to work in hermetic environments! Especially web stack tooling is so bad about this that lots of Bazel tools are automatically patching generated scripts from npm or pip, in order to get things working properly.

There is also incidental complexity when it comes to running Bazel itself, because it uses symlinks to support sandboxing by default. I have run into several programs that do "is file" checks that think that symlinks are not files.

Granted, we are fortunate that lots of that work happens in the open and goes beyond Google's "just vendor it" philosophy. But Docker's "let's put it all in one big ball of mud" strategy papers over a lot of potential issues that you have to face front-on with Bazel.

Once it works though... beautiful stuff.

tylerhou · on Jan 13, 2023

> npm or pip, in order to get things working properly.

Yes, this is one large difference between development at Google vs most other companies. https://opensource.google/documentation/reference/thirdparty

Personally I think this is what companies should do -- it guarantees hermeticity as you say, guards against NPM repo deletion (left-pad) and supply chain attacks. But for people who are used to just `npm install` there is a lot more overhead.

rtpg · on Jan 13, 2023

personally I don't think there is that much value in society in endlessly vendoring exactly the same code in various places. This is why we have checksums!

I understand that Google will do this stuff to remove certain stability issues (and I imagine they have their own patches!), but I don't think that this is the fundamental issue relative to practical integration issues that are solvable but tedious.

EDIT:I do think people have reasons for doing vendoring, of course, I don't think that it should be the default behavior unless you have a good reason.

ankitdce · on Jan 13, 2023

For everyone who complains about monorepos, remember some of the most forward thinking engineering companies like Google and FB also use monorepos. All the arguments that people make in favor of polyrepo are making so because of lack of strong tooling for monorepos. That's also why Google and FB would not have scaled if they were using GitHub / GitLab but had to build their own. Also Google's original source control was built on top of perforce!

scarmig · on Jan 13, 2023

The issue with building a custom monorepo system that can handle Google's and Facebook's scale is that it fails to scale down, even to moderately large project and organizational size. It's expensive (think at least 7 figures opex) and not what most people should be doing.

git, for all its issues (and I'm a git-hater), scales down to an individual coder and scales up (with a lot of hacks, the hacks being used varying depending on whether you're taking a poly or mono approach) to companies that employ thousands of developers.

hbrn · on Jan 13, 2023

Polyrepo scales to thousands coders without anyone even noticing, and that's the beauty of it.

Just look at the size of node_modules in an average project. You stand on the shoulders of thousands of other engineers and you don't even think about it. That's your polyrepo at work.

Now imagine that every time a dependency wants ship a new version, their maintainers have to update all of the dependents. That's your monorepo.

It is quite obvious which one is more scalable.

The only real benefit monorepo has is that every dependency is always at it's latest version. But the cost to achieve that... let's just say there's a reason you mostly hear about monorepos from Google.

andromeduck · on Jan 13, 2023

Would be nice if google3 as a service were a thing.

JTBooth · on Jan 13, 2023

google3 and Plex! God how I miss working with Plex, I didn't know how good we had it.

8n4vidtmkvmk · on Jan 13, 2023

Plex the home media server, or something else?

Rastonbury · on Jan 13, 2023

I think it's an internal Google tool

slotrans · on Jan 13, 2023

Ah yes, the old "if FAANG does it, it must be right" argument. You love to see it.

tourist2d · on Jan 13, 2023

That's not what they said at all. You love to see people unable to read a couple of sentences before jumping to their own conclusion.

qznc · on Jan 13, 2023

Amazon does polyrepo by the way. Unfortunately, they don’t talk about it.

hbrn · on Jan 13, 2023

For anyone who complains about dictatorships, remember some of the most resource-abundant countries are dictatorships.

Those companies should be one of last places to look for good software development practices. They have absolutely no incentive to recognize their own mistakes.

threeseed · on Jan 13, 2023

a) There is no correlation between mono/poly repos and your ability to scale. There are many examples of successful companies using either approach.

b) As a general rule people should be cautious about adopting approaches and technologies from Google, Meta without a clear understanding of why they need it. What works at their scale doesn't always apply to smaller teams.

afavour · on Jan 13, 2023

> remember some of the most forward thinking engineering companies like Google and FB also use monorepos

As a counterpoint though I’d say that the issues Google and FB face, particularly in terms of the sheer scale of the work they’re doing, is pretty unique.

Google literally invents programming languages for domains it feels needs them, I’m not about to blindly follow that practise either.

fargo · on Jan 13, 2023

They way at least FB is using a monorepo is very different than any kind of monorepo most people imagine. It's not just about tooling, git itself could never handle it. I am all about using a monorepo but Google and FB having one is not the an argument for it.

phyrex · on Jan 13, 2023

how so?

agos · on Jan 13, 2023

didn't meta need to switch to mercurial and a custom extension for it because they were not able to make it work with git?

Shish2k · on Jan 13, 2023

IIRC originally neither mercurial nor git would scale enough, but mercurial was much more willing to accept scalability-related patches, so long as they didn’t harm the more common small-scale use-cases. After a couple of years of submitting patches for moderate improvements, meta wanted to make some more controversial large-scale changes like dropping support for sequential commit numbers, and ended up hard-forking and breaking compatibility to do that. That incompatible-but-better-performing fork then stayed internal for a while, before recently being released as Sapling.

As a bonus, part of the rewrite of the internal storage engine involved creating a storage engine abstraction layer, which in turn made it easy to add Git as a backend :D

_rm · on Jan 13, 2023

I'd like to add my own PSA in concurrence with this.

Just use a monorepo. Use tooling to work around its limitations if you reach that point.

I work in a SaaS that's polyrepo based, having split from its original monorepo as part of a microservices push (which never succeeded, leaving us stuck half way in the worst of both worlds).

Nothing has been more destructive to productivity than the polyrepos and their consequences. We're talking a 20% engineering spend dead weight loss.

This is stark obvious to every single engineer, but trying to get people to accept that fact and sign off on a project to coalesce them back into a monorepo is just insurmountable.

Polyrepos are irreversible damage, stay away from them. Hold the line on your monorepo. One organisation, one repo.

eclipticplane · on Jan 13, 2023

Either I work with you, or the _failed microservice push leaving us in the worst of all worlds_ is unfortunately common.

quickthrower2 · on Jan 13, 2023

As a special case: polyrepos are fine for when you have a genuine plugin architecture. To be a genuine one: you have a public facing end to end documentation on how to use it.

I would use chrome extensions vs. chrome itself as an example. Or same for VSCode. Then vendor plugins can be in their own repos.

I think otherwise for simple module A and module B that interact privately, in an adhoc way, the separation of concerns is not there and monorepo is better.

lloydatkinson · on Jan 13, 2023

Absolutely this. I am experiencing a polyrepo setup of 254 repositories. The level of productivity loss is at a scale I don't think can be adequately describe in words.

berjin · on Jan 13, 2023

I'm going to start a new term: goldilocks repo.

Too small and you end up needing to deploy 5 apps to get your feature out and tests are difficult to coordinate.

Too large and the huge CI suite takes forever, people step on each others toes and it can be difficult to get things done. Access control and permissions is also difficult if you ever want to transfer ownership to another team.

The happy middle ground is to follow Conways law a little and have a few related apps/modules belonging to the same team in one repo and an a CI that has a simple integration test for all of them. It's fairly natural to achieve this if you don't have people following cargo cult memes.

throwaway_au_1 · on Jan 12, 2023

I would like for any article insisting I adopt any practice to put a little more effort into doing so. This seems to amount to "it's good", "I should have done it sooner", "it's not so bad, trust me", and "smart people like it, too". Could have been a tweet.

yodsanklai · on Jan 12, 2023

Agree, not a good article. Hopefully it'll trigger interesting discussions.

siliconc0w · on Jan 12, 2023

Probably disagree - Google has built a boat-load of infrastructure to enable a mono-repo which you don't have and (I don't think) aren't even available either OSS or commercial. Use modules/libraries as an alternative to separate repos. This may require a few more commits for distributed changes but that isn't hard and forces you to consider realities like, "what if A service is deployed before B service?" which happen whether or not you're using a mono-repo. Also use data-lakes for data or generated artifacts (something based on S3 or an S3-alike).

atomicfiredoll · on Jan 13, 2023

my understanding is Nrwl's Nx is built by ex-Google employees, that their integrated repo style is based on the approach used at Google, and that it's open source (they make money with their paid build caching solution.)

Most companies aren't Google. So, the fact that they can't run the exact same tools doesn't seem like a real blocker when alternatives exist.

I've used Nx a bit before and intend to for my next project. Even as a solo dev initially, I'm optimistic it will help me keep things organized and catch potential issues with shared code via the dependency graph.

hgsgm · on Jan 12, 2023

Google used Perforce for many years until they outgrew it. What size org outgrows Perforce but can't develop and maintain their own thing?

hocuspocus · on Jan 13, 2023

It's just one piece of the puzzle though. Have you noticed how companies running Bazel usually have a dedicated team? Many orgs are absolutely not interested in that.

jeffbee · on Jan 13, 2023

No, i haven't noticed that. My team just uses bazel. What necessitates a dedicated bazel team?

erksa · on Jan 13, 2023

Earlier in my career I was so excited to learn about microservices, as I was working with a huge monolith at the time. Then I went a bit overboard, and for a new prototype I had all these services in different repos and running independently. It was all very cool and I felt very accomplished, only to later (and now painfully obvious) be pointed out it was completely unnecessary, to the point each function basically was its own deployment. But not in a way that made any sense. If you were looking for the _most_ expensive way of running things, maybe.. hah

Tunnel-vision is a powerful thing, and if this is how people are splitting repos now a days, it seems I might not have been the only one chopping up perfectly good dish only to be left with the individual ingredients.

scarmig · on Jan 13, 2023

You can have microservices and different repos, microservices and a monorepo, or a monolith and monorepo. In principle you could even have a monolith split across different repos (e.g. each library in a separate repo). What's included in a deployed binary has nothing to do with how the source code is structured.

lloydatkinson · on Jan 13, 2023

The post is about monorepos not microservices.

erksa · on Jan 13, 2023

Yes, that's not lost on me, however my point might have been.

The point was, when learning this by yourself, you can really go down the wrong path with the right advice at the wrong time. This seems to hold true for the monorepo discussion as it did/does for microservices.

shagie · on Jan 13, 2023

For me, this boils down to a question of "what set of problems do you want to solve?" and "what set of problems do you want the system to solve for you?"

To this end, I am generally a monorepo proponent. The problems with "what gets built" and how do you do versioning of the components" for the systems that I work on is an easier set of problems to solve than "how do I ensure that I don't break anything else when I update this method?" and "if I do break something when I update the method, where is everything that needs to change?"

I've had projects that were fractured across half a dozen repos where libraries were versioned and tagged and and deployed, and then the next down stream was updated to use the newly tagged version... and something broke. This was especially bad when the "something broke" was part of another team's code or it had been thrown over a wall. A bunch of changes, many PRs - some of which are just updating a version to pull from the artifact repo.

When things were updated, I took the opportunity to move all of those projects into one monorepo and I only had one PR to review - that passed all the tests for all of the projects contained within the monorepo.

Yes, deployments and tagging and the "well, just because you reved X to 2.3.0 doesn't mean that Y is at 2.3.0 too" (strict semver can get a bit on the messy side).

It's a question of which problems do you want to solve. I tend to find that me solving the "how do I set up the ci file and coordinate version numbering" is an easier problem to solve than "all these builds need to be updated in a bunch of different repos and one of them breaks."

osigurdson · on Jan 13, 2023

I call bike shed on the entire debate. One repo or many can work fine depending on the situation.

kmbfjr · on Jan 13, 2023

The whole debate gives work to people who don’t want to work on the actual product or service.

sakex · on Jan 13, 2023

> I am here to tell you: if you are running a software business and you aren't at, like, Google-tier scale, just throw it all in a monorepo.

Funny because Google is famous for having a monorepo at scale.

wankle · on Jan 13, 2023

"Maybe it's some abstract sense of purity"

Someone gets a few lines of code under their belt and now they overwhelmingly know what is best for all.

It's simple, if it gets released together, put it in the same repo (even if it's a big repo like the Linux kernel). If not, separate it out so you can stay sane.

If it's not released together then it's a monorepo, also defined as a big ball of spaghetti.

Octabrain · on Jan 13, 2023

I have my doubts about the mono repo approach. From the top of my head:

1. It might increase the complexity of other process on that repo: CI/CD configuration, makefiles, branching strategies, codeowners etc

2. The versioning might lose its meaning

3. Blast radius in case of screwing things up accidentally.

I personally split repositories based on responsabilities: here the code base, here the iac, here the configuration, here the manifests. Always using a standard and predefined naming convention for the repository names. That being said, as always, it depends. I might embrace the monorepo if the context demands it and it has been properly discussed and evaluated.

cosmotic · on Jan 13, 2023

1. Often it ends up being easier

2. Versioning becomes easier

3. Blast radius is similar but cleanup is easier

Octabrain · on Jan 13, 2023

> 1. Often it ends up being easier

It hasn't been like that most of the time in my experience. Having to integrate in the same CI/CD pipeline, let's say, from the code base side, linters, tests and build, from the iac side, formatting, validation, plan, from the manifests side, yaml linters, builds (e.g kustomize), dry runs etc. Now, think on all the previous stuff to consider but also adding the logic for something common like "based on the branch, run this and not that over this environment". Of course you could sepparate the previous in different pipelines, but then you might have to control somehow the order of execution of each. It is possible, however, it might be complex. Also, one must consider that sometimes, reaching a consensus between devops and developers can be tricky and you might end up in people stepping in each other toes.

2. Versioning becomes easier

How so? From my point of view, if I find a repository containing an specific thing (e.g a Terraform module) and I see the release version "1.0.1" I can get a clear idea of what that it implies. However, if that module is versioned along the terraform files themselves as well as the code base what was fixed on that "1.0.1" version, the Terraform module?, something in the app itself? Of course, I can go to the release notes and spend some time reading them, but you better be following some good practices on that regard, otherwise is gonna be time consuming.

> 3. Blast radius is similar but cleanup is easier

I could buy this one, however, it is preferable to assume the risk because of how easy is to clean up things in case they go south or is better to avoid the risk in the first place by having everything sepparated on the first place?

I guess context (complexity of the project, culture, communication across teams) is the key.

TylerE · on Jan 13, 2023

More importantly re 3. things break sooner. This is good. The sooner the breakage is noticed the less merging there is to do…