To me, monorepo vs multi-repo is not about the code organization, but about the deployment strategy. My rule is that there should be a 1:1 relation between a repository and a release/deployment.
If you do one big monolithic deploy, one big monorepo is ideal. (Also, to be clear, this is separate from microservice vs monolithic app: your monolithic deploy can be made up of as many different applications/services/lambdas/databases as makes sense). You don't have to worry about cross-compatibility between parts of your code, because there's never a state where you can deploy something incompatible, because it all deploys at once. A single PR makes all the changes in one shot.
The other rule I have is that if you want to have individual repos with individual deployments, they must be both forward- and backwards-compatible for long enough that you never need to do a coordinated deploy (deploying two at once, where everything is broken in between). If you have to do coordinated deploys, you really have a monolith that's just masquerading as something more sophisticated, and you've given up the biggest benefits of both models (simplicity of mono, independence of multi).
Consider what happens with a monorepo with parts of it being deployed individually. You can't checkout any specific commit and mirror what's in production. You could make multiple copies of the repo, checkout a different commit on each one, then try to keep in mind which part of which commit is where -- but this is utterly confusing. If you have 5 deployments, you now have 4 copies of any given line of code on your system that are potentially wrong. It becomes very hard to not accidentally break compatibility.
TL;DR: Figure out your deployment strategy, then make your repository structure mirror that.
You can have a mono-repo and deploy different parts of the repo as different services.
You can have a mono-repo with a React SPA and a backend service in Go. If you fix some UI bug with a button in the React SPA, why would you also deploy the backend?
This is spot on. A monorepo can still include a granular and standardized CI configuration across code paths. Nothing about monorepo forces you to perform a singular deployment.
The gains provided by moving from polyrepo to monorepo are immense.
Developer access control is the only thing I can think to justify polyrepo.
I'm curious if and how others who see the advantages of monorepo have justified polyrepo in spite of that.
> If you fix some UI bug with a button in the React SPA, why would you also deploy the backend?
Why would you bother to spend the time figuring out whether or not it needs to get deployed? Why would you spend time training other (and new) people to be able to figure that out? Why even take on the risk of someone making a mistake?
If you make your deployment fast and seamless, who cares? Deploy everything every time. It eliminates a whole category of potential mistakes and troubleshooting paths, and it exercises the deployment process (so when you need it, you know it works).
It's literally automatic by configuring your workflows by file changes.
> If you make your deployment fast and seamless, who cares?
It still costs CI/CD time and depending on the target platform, there are foundational limitations on how fast you can deploy. Fixing a button in a React SPA and deploying that to S3 and CloudFront if fast. Building and deploying a Go backend container -- that didn't even change -- to ECS is at least a 4-5 minute affair.
I get that, and like anything it's a matter of trade-offs.
To me, in most cases, 4 or even 10 minutes is just not a big deal. It's fire-and-forget, and there should be a bunch of things in place to prevent bad deploys and ultimately let the team know if something gets messed up.
When there's an outage, multiple people get involved and it can easily hit 10+ hours of time spent. If adding a few minutes to the deploy time prevents an outage or two, that's worth it IMHO.
If you don't deploy in tandem, you need to test forwards & backwards compatibility. That's tough with either a monorepo or separate repos, but arguably it'd be simple with separate repos.
All you need to know is "does changing this code affect that code".
In the example I've given -- a React SPA and Go backend -- let's assume that there's a gRPC binding originating from the backend. How do we know that we also need to deploy the SPA? Updating the schema would cause generation of a new client + model in the SPA. Now you know that you need to deploy both and this can be done simply by detecting roots for modified files.
You can scale this. If that gRPC change affected some other web extension project, apply the same basic principle: detect that a file changed under this root -> trigger the workflow that rebuilds, tests, and deploys from this root.
You wouldn't, but making a repo collection into a mono-repo means your mono-deploy needs to be split into a multi-maybe-deploy.
As always, complexity merely moves around when squeezed, and making commits/PRs easier means something else, somewhere else gets less easy.
It is something that can be made better of course, having your CI and CD be a bit smarter and more modular means you can now do selective builds based on what was actually changed, and selective releases based on what you actually want to release (not merely what was in the repo at a commit, or whatever was built).
But all of that needs to be constructed too, just merging some repos into one doesn't do that.
It's not complex if it's just a 'dump everything in one place'-repository, but the concept of a mono repository is that separate applications that share code can share it by being in the same commit stream.
That's where it immediately gets problematic; say you have a client and a server and a shared library that does your protocol or client/server code for you. If you wanted to bump the server part and check if it's backwards compatible with say, the clients that you have already deployed, your basic "match this stuff" pattern doesn't work because it now matches either nothing, because it just builds the library and not the client or the server, or it matches all of it, because you didn't make a distinction. And when you just want the new server with the new library and nothing else, you somehow have to build and emit the new library first, reference that specific version in your code, rebuild your code, and meanwhile not affect the client. That also means when someone else is busy working on the client or on the older library, you have to essentially all be on different branches that cannot integrate with each other because they would overwrite or mess up the work of the people that are busy with their own part.
You could go monorail and do something where every side effect of the thing you touches automatically also becomes your responsibility to fix, but most companies aren't at a scale to pay for that. That also applies to things like Bazel. You could use it to track dependent builds and also tag specific inter-commit dependencies so your server might build with the HEAD source library and your client would still be on an older commit. But that that point you have just done version/reference pinning, with extra steps. You also can't see that at the same time in your editor, unless you open two views in to the same source, at which point you might as well open two views, one for the library and one for the server.
If your project, organisation of people (teams) and dependencies are simple enough, and you only deploy one version at a time, of everything, then yes, doing multiple directories in a single repo or doing multiple repos is not much of a change. But you also gain practically nothing.
There are probably some more options as I don't doubt it is more of a gradient; the Elastic sources seem to do this somewhat well, but they all release everything at once in lockstep. I suppose one way to take separate versions out of that is to always export the builds and have a completely separate CD configuration and workflow elsewhere. This appears to be what they have done as it's not public.
Unless that literally prevents you from accessing other parts of the repository, that creates a big risk that you depend on something from another folder and are not triggering rebuilds when you should.
This mirrors my own experience in the SaaS world. Anytime things move towards multiple artifacts/pipelines in one repo; trying to understand what change existed where and when seems to always become very difficult.
Of course the multirepo approach means you do this dance a lot more:
- Create a change with backwards compatibility and tombstones (e.g. logs for when backward compatibility is used)
- Update upstream systems to the new change
- Remove backwards compatibility and pray you don't have a low frequency upstream service interaction you didn't know about
While the dance can be a pain - it does follow a more iterative approach with reduced blast radiuses (albeit many more of them). But, all in all, an acceptable tradeoff.
Maybe if I had more familiarity in mature tooling around monorepos I might be more interested in them. But alas not a bridge I have crossed, or am pushed to do so just at the moment.
I couldn't disagree more. DevOps probably likes it but from software dev standpoint it's not helpful at all. Had to stave that thinking off at my last job (because of devops). It was obvious it was going to be a huge waste of time with a continuing negative payback. It was trivial to essentially put a description file for our repo of 7 or so components.
If you do one big monolithic deploy, one big monorepo is ideal. (Also, to be clear, this is separate from microservice vs monolithic app: your monolithic deploy can be made up of as many different applications/services/lambdas/databases as makes sense). You don't have to worry about cross-compatibility between parts of your code, because there's never a state where you can deploy something incompatible, because it all deploys at once. A single PR makes all the changes in one shot.
The other rule I have is that if you want to have individual repos with individual deployments, they must be both forward- and backwards-compatible for long enough that you never need to do a coordinated deploy (deploying two at once, where everything is broken in between). If you have to do coordinated deploys, you really have a monolith that's just masquerading as something more sophisticated, and you've given up the biggest benefits of both models (simplicity of mono, independence of multi).
Consider what happens with a monorepo with parts of it being deployed individually. You can't checkout any specific commit and mirror what's in production. You could make multiple copies of the repo, checkout a different commit on each one, then try to keep in mind which part of which commit is where -- but this is utterly confusing. If you have 5 deployments, you now have 4 copies of any given line of code on your system that are potentially wrong. It becomes very hard to not accidentally break compatibility.
TL;DR: Figure out your deployment strategy, then make your repository structure mirror that.