Monorepos are much easier for everyone to use, and are the only natural way to m...

anon9001 · on Nov 11, 2021

> Monorepos are much easier for everyone to use, and are the only natural way to manage code for any project.

I strongly disagree with that, but I'll let this blog post explain it better than I can: https://medium.com/@mattklein123/monorepos-please-dont-e9a27...

> You keep talking about Google, but a much more famous monorepo is Linux itself.

I thought it was fairly well known that monorepos came directly from Google as part of their SRE strategy. It didn't even come into common usage until around 2017 (according to wikipedia). If I'm remembering correctly, the SRE book recommends it, and that's why it gained popularity.

Also, I don't believe that Linux is a valid interpretation of "monorepo". Linux is a singular product. You can't build the kernel without all of the parts.

A better example would be if there was a "Linus" repo that contained both git and linux. There isn't, and for good reason.

> The fact that git is very poor at scaling monorepos might mean that it's a bad idea to use git for larger organizations, not that it's a bad idea to use monorepos. If git can be improved to work with monorepos, all the better.

Any performance improvement in git is welcome, but anything that sacrifices a full clone of the entire repository is antithetical to decentralization.

The whole point of git is decentralized source code.

cdcarter · on Nov 11, 2021

I think it's at least somewhat fair to call Linux a monorepo. There are a lot of drivers included in the main tree. They don't need to be, (we know this because there are also lots of drivers not in the source tree). But by including them, the kernel devs can make large changes to the API and all the drivers in one go. This is a classic "why use a monorepo".

solarmist · on Nov 11, 2021

Monorepos (up to a certain size where git starts getting too slow) are easier to use unless you have sufficient investment into dev tooling.

I think "monorepo" here is a shorthand for large, complex repos with long histories which git does not scale well to whether or not it is all of the repos for an organization. For example I'd call the Windows OS a monorepo for all of the important reasons.

dataangel · on Nov 11, 2021

> Also, I don't believe that Linux is a valid interpretation of "monorepo". Linux is a singular product. You can't build the kernel without all of the parts.

But it’s also larger scale than the vast majority of startups will ever reach. My work has had the same monorepo for 8 years with over 100 employees now and git has had few problems.

xorcist · on Nov 12, 2021

> You can't build the kernel without all of the parts.

You most certainly can. Loadable modules have been part of the kernel for over 20 years now.

The fact that many drivers exists out-of-tree should be enough to settle that particular argument.

There are libraries in the kernel perfectly usable on its own. There are also many scripts, analysis and testing tools that live in the kernel that build and runs separately from the kernel itself. Then there's a whole lot of documentation.

Linux is what git what designed for. It sits in a singular respository and no other repositories are needed to build a working product. It can accurately be described as a monorepo, if that particular distinction was important.

tsimionescu · on Nov 12, 2021

I'd also note that Linux doesn't use any kind of dependency management tools between all of its sub-components: everything builds using what Git keeps on disk.

Also, there are no external dependencies: if you want a new library to be used, you copy its source to the kernel source tree - a classic monorepo solution, that detractors claim "doesn't scale".

anarazel · on Nov 12, 2021

Which pre-existing libraries have been integrated that way into the kernel? There's a few tiny pieces, but largely you can't just use pre-existing libs inside the kernel.

howinteresting · on Nov 11, 2021

> The whole point of git is decentralized source code.

The "whole point of git" is to provide value to its users. Full decentralization is not necessary for that.

naniwaduni · on Nov 12, 2021

Providing version control is also not strictly necessary to providing value to users. But no, I'm pretty sure git's whole point is "to provide value to its users".

tsimionescu · on Nov 12, 2021

> I strongly disagree with that, but I'll let this blog post explain it better than I can: https://medium.com/@mattklein123/monorepos-please-dont-e9a27...

That article completely misses the point about project history - a monorepo has the full history of your project, while multiple repos split that history. If you have a good split, everything is fine, but if you're moving code between repos relatively often, everything gets muddled.

It also overstates the need for build artifact management in a monorepo. For 3rd party code that has a good package management solution you can use that, while still using the simpler solution of no dependency management for internal libraries, in the good old C tradition. I will again point to the Linux kernel as a good example of doing this successfully - they don't do any kind of versioned build artifacts for any of the many tens of libraries they use - they just rely on git.

> I thought it was fairly well known that monorepos came directly from Google as part of their SRE strategy. It didn't even come into common usage until around 2017 (according to wikipedia). If I'm remembering correctly, the SRE book recommends it, and that's why it gained popularity.

While Google may have coined this term in 2017, the idea of keeping all of an organization's code into a single repo was around since forever. The ~1k dev company I work for had a single Perforce repo with history going back to 1998 or something like that.

> Also, I don't believe that Linux is a valid interpretation of "monorepo". Linux is a singular product. You can't build the kernel without all of the parts.

This is probably the core of disagreement actually. Linux of course has all sorts of internal parts that can be considered libraries/modules. They famously have a huge amount of drivers, but there are also things like kernel-space implementations of much of the C stdlib, such as kmalloc, compression libraries, a unit testing framework, the ebpf compiler and runtime, more than 40 file systems, network stacks for various protocols, and so many more. Nothing prevents the kernel team from splitting up the kernel into a core 'app' + tens of libraries and add tooling to stitch these together from separate repos (since it's C, you could theoretically do all of this at the single source file level, even). Not to mention, if you want to use a 3rd party library in Linux, there is only one way to do it: you copy its source into the kernel source tree.

Of course, no one is suggesting such a thing, because it's well accepted that the kernel is a 'single product'. But multiple-repo advocates usually miss the fact that everyone starts with a single repo and a single product, and then they gradually isolate parts of that product as 'libraries', and they gradually split up functionalities as 'separate products', and it's not always clear when this separation is actually finished enough to make the decision to split it off into its own repo, if ever.

> Any performance improvement in git is welcome, but anything that sacrifices a full clone of the entire repository is antithetical to decentralization.

> The whole point of git is decentralized source code.

Partial clones are still decentralized source, these concepts are completely orthogonal. As long as I can clone all of the code I am working on and all of its history, it obviously makes no difference if this is a single repo from a multi-repo project or it is a single part of a monorepo.

Also, the whole point of git is managing source code history. The decentralized part is only important for decentralized projects, like Linux. Most projects, either private or public, work in a centralized way, and maintain a central source control server, and local git clones are just nice-to-have caches.

And I want to emphasize again: I am talking about most open source projects here as well, even some of that large ones like Apache. Even GNU typically works in this centralized manner: you get the central repo for some project, you make changes, you rebate your changes on master, you send a patch for review to the mailing list, and if the patch is accepted, you ore some maintainer commit it to the central repo.

In a decentralized workflow like Linux, you start off by cloning one or more relevant authoritative repos (Linus' repo, or Debian's repo etc), you make your changes, you format them as a patch, and you send the patch to the maintainers of the repos you want to change - e.g. Debian's repo for a security fix maybe, or the kernel maintainer for the subsystem you want to change. If they like your patch they take it and put it in their repo. If it's important enough, it will then slowly percolate through the ecosystem in various ways - the maintainer will eventually push it to Linus to be included in an official release, and Debian will eventually take it from Linus' repo. Some mainly not even wait for it to make it to an official Linux release - a cutting-edge distro may directly take changes from another maintainer's repo.

There are extremely few projects that work in this manner.

Edit: added a few more details on just how much stuff is in the Linux source tree, and how much they work exactly like the linked article claims will never scale (no dependency management, no 3rd party artifacts etc).

CRConrad · on Nov 13, 2021

> Nothing prevents the kernel team from splitting up the kernel into a core 'app' + tens of libraries

Hmm -- would a different gitting strategy on the part of the kernel dev team perhaps influence their take on the whole monolithic vs microkernel question? :-)