Genuine question, because I've never worked somewhere with a monorepo infrastructure: is it really "one repo for all code in the organization" or "one repo for everything related"?
In my organization we have around 70k internal git repos (and an order of magnitude fewer public ones), but of course not everything is related to everything else; we produce many distinct software products. I can understand "collect everything of a product to a single repo"; I can even understand going to "if there is a function call, that code has to be in the same repo". But putting everything into a single place... What are the benefits?
In game dev monorepo per product is often used, which includes game code, art assets, build system and tooling, as well as engine code that can receive project-specific patches. In Perforce, it's organised into streams, where development streams are regularly promoted to staging, then to release, etc.
The benefit is the tooling, as the article mentioned. Everything in the repo is organised consistently, so I can make ad-hoc Python tools relying on relative paths knowing that my teammates have identical folder structure.
When you have N repos, you also have N ways of managing dependencies, N ways of doing local bin scripts and dev environment setups, N projects with various out of date & deprecated setups, N places to look when you need to upgrade a vulnerable dependency, N services which may or may not configure telemetry in a consistent way, N different CI & deployment workflows…
It just gets very difficult to manage, especially if people frequently need to work across many repos. Plus, onboarding is a pain in the ass.
Monorepo example: if I want to add a new Typescript package/library for internal NodeJS use, we have a boot strapping script that sets it up. And it basically:
1. Inherits a tsconfig that just works in the context of the repo
2. Jest is configured with our default config for node projects and works with TS out of the box.
3. Listing / formatting etc are all working out of the box.
4. Can essentially use existing dependencies the monorepo uses
5. Imports in existing code work immediately since it’s not an external dependency
6. CI picks up on the new typescript & jest configs and adds jobs for them automatically
7. Code review & collaboration is happening in the same spot
8. This also makes it easier to have devs managing the repo — for example, routine work like updating NodeJS is a lot easier when you know everything is using a nearly identical setup & is automatically verified in CI.
One challenge I had to help solve in a previous job was that onboarding was difficult because we had a small number of large repos everyone worked in. The standards were slightly different across them. Npm, pnpm, and yarn were all in use. Deployment worked pretty differently among them. CI setups were unique, and each of the large projects had, if not a team, some number of people spending a lot of time just managing the project’s workflows.
So many coordination things just get easier when there isn’t an opportunity to get out of sync. If you do separate repos, you can totally share config… but now it costs a dependency update PR to pull in that tiny update to the shared unit test config and now everything. It’s just guaranteed to get out of sync, and it’s hard to catch issues when you can’t validate a config change with all projects using it at the same time.
So because it becomes trickier (and takes work) just to do the action of syncing multiple repo’s setups… inevitably, you end up with some “standards” that are loosely followed and a lot of slightly different setups that get hard to untangle the longer they grow. If you can accept the cost of context switching between repos, or if people don’t need to switch, maybe it’s ok… until something like a foundational dependency update (NodeJS, Typescript, React, something like that) needed for security becomes extremely difficult because you have a million different ways of configuring things and the JS ecosystem sucks
You CAN have n different whatever in a polyrepo - that doesn't mean you must. You can settled on a company wide package manager, CI system, build system and whatever. That is what my company has, while each repo has their own setup scripts, there are maybe 100 lines in each repo (and almost all of those lines are redundant and could be combined if I spent some time).
The above breaks down when we have third party code - since they don't follow our common patterns for building so they have to do something different. Bringin that into a monorepo would be just as different from everything else.
> When you have N repos, you also have N ways of managing dependencies, N ways of doing local bin scripts and dev environment setups, N projects with various out of date & deprecated setups, N places to look when you need to upgrade a vulnerable dependency, N services which may or may not configure telemetry in a consistent way, N different CI & deployment workflows…
No, you do not, unless you mean N=1. Build scripts/tooling/linters etc are put into a different repo, are released and consumed by each individual repo.
In my organization we have around 70k internal git repos (and an order of magnitude fewer public ones), but of course not everything is related to everything else; we produce many distinct software products. I can understand "collect everything of a product to a single repo"; I can even understand going to "if there is a function call, that code has to be in the same repo". But putting everything into a single place... What are the benefits?