As a big fan of the monorepo approach personally, I would say the biggest benefi...

015a · on Jan 13, 2023

I think we're just reducing down to "programming at scale" is hard, at some point.

Sure; that is a really big problem, and it becomes a bigger problem the bigger you are. But, as you become bigger: the monorepo is constantly changing. Google has an entire team dedicated to the infrastructure of running their monorepo. Answering the question: "For this service, in this monorepo, what is the set of recent changes" actually isn't straightforward without additional tooling. Asking "what PRs of the hundred open PRs am I responsible for reviewing" isn't straightforward without, again, additional tooling. Making the CI fast is hard, without additional tooling. Determining bounded contexts between individuals and teams is hard, without additional tooling.

The biggest reason why I am anti-monorepo is mostly that: advocates will stress "all of this is possible (with additional tooling)", but all of this is possible, today, just by using different repos. And I still haven't heard a convincing argument for what benefits monorepos carry.

Maybe you could argue "you know exactly what state the system was in when something happened", sure. But when you start getting CI pipelines that take 60 minutes to run, or failed deployments, or whathaveyou; even that isn't straightforward.

And I also question the value of that; sure you have a single view of state, but problems don't generally happen "point in time"; they happen, and then continue to happen. So if we start an investigation by saying "this shit started at 16:30 UTC", the question we first want to have answered is "what changed at 16:30 UTC". Having a unified commit log is great, but realistically: a unified CI deploy log is far more valuable, and every CI provider under the sun just Does That. It doesn't mean squat that some change was merged to master at 16:29 if it didn't hit prod until 16:42; the problem started at 16:30; the unified commit log is just adding noise.

bluGill · on Jan 12, 2023

You had the wrong tools. It doesn't matter if you have a monorepo or not, you will need tools to manage your project.

I'm on a mutirepo project and we can't have that problem because we have careful versioning of what goes together. Sure many combinations are legal/possible, but we control/log exactly what is in use.

xmodem · on Jan 13, 2023

> Sure many combinations are legal/possible, but we control/log exactly what is in use.

I'll acknowledge our tooling could have been better, but isn't it better to just be able to check out one revision of one repo and have confidence that you're looking at the code that was running?

threeseed · on Jan 13, 2023

It depends on your architecture.

If I have a services based architecture then I can jump straight to the repo for that particular service and have confidence that it is the code that is running.

Spivak · on Jan 13, 2023

So instead of adopting a system that makes the problem we’re discussing not possible you use a human-backed matrix of known compatible versions?

Like you do you but I’ve never seen “just apply discipline” or “just be careful” ever work. You either make something impossible, with tooling or otherwise, or it will happen.

bluGill · on Jan 13, 2023

No, it is a tool backed matrix. Illegal combinations are not possible, and we have logs of exactly what was installed so we can check that revision out anytime

funcDropShadow · on Jan 13, 2023

To solve this properly you need to store the deployed/executed commit id of any service. That could be in the logs, in the a label/annotation of a kubernetes object or somewhere else. But this has nothing to do whether you use a monorepo or multiple smaller repositories. In some projects of me, we use the commit of the source repo as docker tag. And we make sure that the docker image build is as reproducible as possible. I.e. we don't always build with the latest commit of an internal library, but with the one that is mentioned in the dependency manifest of our build tool. Since updating all those internal dependencies is a hassle, that is updated automatically. It means there is an auto-generated merge requests to update a dependency for every downstream project. Therefore all the downstream pipelines can run their test suites before an update gets merged. Once in a while that fails, then a human has to adapt the downstream project to its latest dependencies. In a monorepo that work has to be done as well. But for all downstream projects at once.

solarkraft · on Jan 13, 2023

Could it be that submodules are underused?

jalapenos · on Jan 13, 2023

Submodules are hell. I work somewhere with a polyrepo topology, with the inevitable "shared" bits ending up integrated into other repos as submodules. Nothing has been more destructive to productivity and caused so many problems.

A plain old monorepo really is the best.

xeyownt · on Jan 13, 2023

Git submodules are really a PITA.

The fact that git checkout did not update submodules was a major design flaw in my opinion.

glandium · on Jan 13, 2023

It can now, but that's not the default. The defaults for submodule suck, because they match the behavior of old versions of git for backwards compatibility.

Too · on Jan 15, 2023

Yeah. Leaving the UX-issues aside. Don't ever use submodules to manage dependencies inside of each polyrepo, it will eventually accumulate duplicate, conflicting and out of date sub-dependencies. Package managers exist for a reason. The only correct way to use submodules is a root-level integration-repository, being the only repo that is allowed to have submodules.

xedrac · on Jan 13, 2023

The only problem I have with a monorepo, is that sometimes I need to share code between completely different teams. For example, I could have a repo that contains a bunch of protobuf definitions so that every team can consume them in their projects. It would be absurd to shove all of those unrelated projects into one heaping monorepo.

xmcqdpt2 · on Jan 13, 2023

Well that's what a monorepo is! I work on one, it's very large, other teams can consume partial artifacts from it (because we have a release system that releases parts of the repo to different locations) but if they want to change anything, then yeah they have to PR against the giant monorepo. And that's good!

Teams out of the repo have to be careful about which version they pull and when they update etc. However, if you are a team IN the monorepo, you know that (provided you have enough tests) breaking changes to your upstream dependencies will make your tests fail which will block the PR of the upstream person making the changes. This forces upstream teams to engage (either by commits or by discussions) with their clients before completing changes, and it means that downstream teams are not constantly racing to apply upgrades or avoiding upgrades altogether.

I work on shared library code and the monorepo is really crucial to keeping us honest. I may think for example that some API is bad and I want to change it. With the monorepo I can immediately see the impact of a change and then decide whether it's actually needed, based on how many places downstream would break.

xedrac · on Jan 16, 2023

Ok. I've had some time to think about this, and I am warming up to the idea. It would sure simplify a lot of challenging coordination problems. My only real concern is that the repo may grow so large it becomes very slow. Doubly so if someone commits some large binaries.

xmcqdpt2 · on Jan 16, 2023

It does become slow eventually, and yes you need discipline and tooling to block people from dumping everything in it.

You do need a lot of code / developers before you outgrow git, cf the linux kernel.

erik_seaberg · on Jan 13, 2023

I’m a git nerd and even I struggle with the submodule UI, there are probably a lot of people who just can’t deal with it.

solarkraft · on Jan 13, 2023

I am certainly not a heavy user, but for work I've made myself a "workflow" repository which pulls together all the repositories related to one task. This works super well. There sure is a bit of weirdness in managing them, but I found it manageable. But I'll admit that I don't really use the submodules for much more than initial cloning, maybe I'd experience more problems if I did.

blagie · on Jan 14, 2023

Yes, but it's because submodules are a badly architected, badly implemented train wreck.

There are many good and easy solutions to this problem, all of which were not implemented by git.

git is a clean and elegant system overall, with submodules as by far the biggest wart in that architecture. They should be doused with gasoline and burned to the ground.

galangalalgol · on Jan 13, 2023

I like using submodules for internal dependencies I might modify as part of an issue. I like conan or cargo for things I never will. I don't particularly like conan. Perhaps bazel, hunter, meson or vcpkg are all better.

SergeAx · on Jan 13, 2023

> internal libraries that depend on other internal libraries

This is where you start to develop nostalgia for well-structured monolithic apps.

The_Colonel · on Jan 13, 2023

I can check out a git revision and the library dependencies will be handled transparently by the package manager.

No doubt this is possible with service approach, but it means additional layers of complexity added on top.

funcDropShadow · on Jan 13, 2023

This should happen on monorepos as well as per-service repos. So it is not argument for any side of that discussion.

The_Colonel · on Jan 13, 2023

But this is a discussion of dependencies between services. You need more tooling for managing inter-service dependencies as opposed to package dependencies within one monolith.

lmm · on Jan 13, 2023

> I've worked in large polyrepo environments. By the time you get big enough that you have internal libraries that depend on other internal libraries, debugging becomes too much like solving a murder mystery. In particular, on more than one occasion I had a stacktrace that was impossible with the code that should have been running. A properly-configured monorepo basically makes that problem disappear.

On the contrary, a monorepo makes it impossible because you can't ever check out what was actually deployed. If what was running at the time was two different versions of the same internal library in service A and service B, that sucks but if you have separate checkouts for service A and service B then it sucks less than if you're trying to look at two different versions of parts of the same monorepo at the same time.

preseinger · on Jan 13, 2023

There is no source of truth for "what was deployed at time T" except the orchestration system responsible for the deployment environment. There is no relationship between source code revision and deployed artifacts.

lmm · on Jan 13, 2023

Hopefully you have a tag in your VCS for each released/deployed version. (The fact that tags are repository-global is another argument for aligning your repository boundaries with the scope of what you deploy).

preseinger · on Jan 13, 2023

Of a service, yes. Of the entire infrastructure, no.

Spivak · on Jan 13, 2023

Why not? I’m doing it right now. The infrastructure is versioned just like the app and I can say with certainty that we are on app version X and infra version Y.

I even have a nice little db/graph of what versions were in service at what times so I can give you timestamp -> all app and infra versions for the last several years.

preseinger · on Jan 13, 2023

Unless your infrastructure is a single deployable artifact, its "version" is a function of all of the versions of all of the running services. You can define a version that establishes specific versions of each service, but that's an intent, not a fact -- it doesn't mean that's what's actually running.

Spivak · on Jan 13, 2023

Am I missing some nuance here? Yes the infra version is an amalgamation of the fixed versions of all the underlying services. Once the deploy goes green I know exactly what’s running down to the exact commit hashes everywhere. And during the deploy I know that depending on the service it’s either version n-1 or n.

The kinds of failures you’re describing are throw away all assumptions and assume that everything from terraform to the compiler could be broken which is too paranoid to be practically useful and actionable.

If deploy fails I assume that new state is undefined and throw it away, having never switched over to it. If deploy passes then I now have the next known good state.

preseinger · on Jan 14, 2023

Oh, this implies you're deploying your entire infrastructure, from provisioned resources up to application services, with a single Terraform command, and managed by a single state file. That's fine and works up to a certain scale. It's not the context I thought we were working in. Normally multi-service architectures are used in order to allow services to be deployed independently and without this form of central locking.

lmm · on Jan 13, 2023

If what was deployed was foo version x and bar version y, it's a lot easier to debug by checking out tag x in the foo repo and tag y in the bar repo than achieving the same thing in a monorepo.

preseinger · on Jan 13, 2023

Of course, but this is entirely possible with a monorepo.

lmm · on Jan 13, 2023

Possible perhaps, but not easy by any means.

funcDropShadow · on Jan 13, 2023

Then you should build one. E.g. gitlab can create special git references for every deployment it ever made.

xmodem · on Jan 13, 2023

Our artifacts are tagged with their git commit and build time, which then gets emitted with every log event.

czx4f4bd · on Jan 13, 2023

I'm not sure I understand how that scenario would arise with a monorepo. The whole point of a monorepo is that everything changes together, so if you have a shared internal library, every service should be using the same version of that library at all times.

lmm · on Jan 13, 2023

And every service deploys instantly whenever anything changes?

(I actually use that as my rule of thumb for where repository splits should happen: things that are deployed together should go in the same repo, things that deploy on different cycles should go in different repos)

xmodem · on Jan 13, 2023

Not necessarily instantly, but our CD is fast enough that changes are in production 5-10 minutes after hitting master.

But what's more valuable is that our artifacts are tagged with the commit hash that produced them, which is then emitted with every log event, so you can go straight from a log event to a checked-out copy of every relevant bit of code for that service.

Admittedly this doesn't totally guarantee you won't ever have to worry about multiple monorepo revisions when you're debugging an interaction between services, but I haven't found this to come up very much in practise.

Edit: I should also clarify, a change to any internal library in our monorepo will cause all services that consume that library to be redeployed.

hmcamp · on Jan 13, 2023

Which CD are you using @xmodem?

xmodem · on Jan 13, 2023

Buildkite, with our own orchestration layer built on top.

yafbum · on Jan 13, 2023

> things that are deployed together should go in the same repo, things that deploy on different cycles should go in different repos

What do you do with libraries shared between different deployment targets?

lmm · on Jan 13, 2023

> What do you do with libraries shared between different deployment targets?

The short answer is "make an awkward compromise". If it's a library that mostly belongs to A but is used by B then it can live in A (but this means you might sometimes have to release A with changes just for the sake of B); if it's a genuinely shared library that might be changed for the sake of A or B then I generally put it in a third repo of its own, meaning you have a two-step release process. The way to mitigate the pain of that is to make sure the library can be tested on its own without needing A or B; all I can suggest about the case where you have a library that's shared between two independent components A and B but tightly coupled to them both such that it can't really be tested on its own is to try to avoid it.

ted_dunning · on Jan 13, 2023

If you have a library that is tightly coupled to A and B, then A and B are effectively coupled.

Ergo, put all three into a single repo because you pretty much have to deploy all three together.

The test for the tightness of coupling is to ask whether A and B can use different versions of the library. If not, they are tightly coupled.

yafbum · on Jan 13, 2023

That's a great test and I think an argument for monorepo for most companies. Unless you work on products that are hermetically sealed from each other, there's very likely going to be tight dependencies between them. Your various frontends and backends are going to want to share data models for the stuff they're exchanging between them for example. You don't really want multiple versions of this to exist across your deployments, at least not long term

lmm · on Jan 14, 2023

I think it's maybe an argument for a single repo per (two-pizza) team. Beyond that, you really don't want your components to be that tightly coupled together (e.g. you need each team to be able to control their own release cycles independently of each other). Conway's law works both ways.

yafbum · on Jan 14, 2023

? You can totally have independent release cycles between multiple targets within a monorepo.

lmm · on Jan 15, 2023

If they have independent release cycles, they shouldn't be tightly coupled (sharing models etc. beyond a specific, narrowly-scoped, and carefully versioned API layer), and in that case there is little benefit and nontrivial cost to having them be in a monorepo.

gregmac · on Jan 13, 2023

Not GP, but I use versioned packages (npm, nuget, etc) for that. They're published just like they're an open source project, ideally using semantic versioning or matching the version of a parent project (in cases where eg we produce a client library from the same repo as the main service).