Hacker News new | past | comments | ask | show | jobs | submit login

Please don't. It's just too slow and not efficient. Instead use common open source best practices of shared library architecture. Problem solved! Putting everything into one repo is just lack of organization and creates a huge mess.



I feel like you've really done no work supporting your argument there. "Slow and inefficient"... what, exactly, is slow and inefficient? Because there are plenty of things slow and inefficient about polyrepos.

I'd say that open-source best practices for shared libraries are appropriate if you're making an open-source shared library. However, these practices are inappropriate for internal libraries, proprietary libraries, and other use cases. In my experience, it's also far from "problem solved". You can point your finger at semantic versioning but in the meantime we go through hell and back with package managers trying to manage transitive library dependencies and it SUCKS. Why, for example, do you think people are fed up with NPM and created Yarn? Or why people constantly complain about Pip / Pipenv and the like? Why was the module system in Go 1.11 such a big deal? The answer is that it's hard to follow best practices for shared libraries, and even when you do follow best practices, you end up with mistakes or problems. These take engineering effort to solve. One of the solutions available is to use a monorepo, which doesn't magically solve all of your problems, it just solves certain problems while creating new problems. You have to weigh the pros and cons of the approaches.

In my experience, the many problems with polyrepos are mostly replaced with the relatively minor problems of VCS scalability and a poor branching story (mostly for long-running branches).


However, these practices are inappropriate for internal libraries, proprietary libraries, and other use cases.

Why do you say so?


Basically because for certain projects and teams, the effort to package internal / proprietary libraries and other similar dependencies can be much larger than the benefit. Packaging is effort. You decide to cut a release, stamp a version number, write a changelog, package and distribute it, and then backport fixes into a long-running branch.

This effort makes a lot of sense if your consumers are complete strangers who work for other organizations. If your consumers are in the same organization, then there are easier ways to achieve similar benefits. See Conway’s Law. It’s not an accident that code structure reflects the structure of the organization that created it, I would claim that organizational boundaries should be reflected in code. Introducing additional boundaries between members of the same organization should not be done lightly.

One of the main benefits of version numbers is that it tells your consumers where the breaking changes are, but if you have direct access to your consumers’ code and can commit changes, review them, and run their CI tests, then you have something much better than version numbers. If you are running different versions of various dependencies you can potentially have a combinatoric explosion of configurations. Then there’s the specter of unknown breaking changes being introduced into libraries. It happens, you can’t avoid it without spending an unreasonable amount of engineering effort, but the monorepo does make the changes easier to detect (because you can more easily run tests on downstream dependencies before committing).

Cross-cutting changes are also much more likely for certain types of projects. These are difficult with polyrepos for obvious reasons (most notably, the fact that you can’t do atomic commits across repos).

Packaging systems also have administrative overhead. If you shove everything in a monorepo you can ditch the packaging system and spend the overhead elsewhere. These days it’s simple enough to shove everything in the same build system.

Various companies that I’ve worked for have experimented with treating internal libraries the same way that public libraries are treated—with releases and version numbers. Most of them abandoned the approach and reallocated the effort elsewhere. The only company that I worked for that continued to use internal versioning and packaging was severely dysfunctional. One startup I worked for went all in on the polyrepo approach and it was a goddamn nightmare of additional effort, even though there were only like three engineers.


I broadly agree with all of this, though I think it's possible to simplify the business of packaging and releasing with the right automation. But lowering the cost doesn't change the more important question of whether that cost is worth bearing.

> One of the main benefits of version numbers is that it tells your consumers where the breaking changes are, but if you have direct access to your consumers’ code and can commit changes, review them, and run their CI tests, then you have something much better than version numbers.

A small peeve of mine: Semver and version numbers generally are lossy compression. They try to squeeze a wide range of information into a very narrow space, for no other reason than tradition.


I really don't understand what you describe as effort or huge burden. Writing a simple script that can solve your releasing tasks is simple. Imo a lot of engineers just want to write code but a lot of the time building software consists of other things too, such as testing, releasing, documentation etc. Simply avoiding them doesn't make it better.


If you think that releasing comes down to a simple script then you and I have radically different ideas about what it means to release something.

I’m also completely baffled by your statement that “simply avoiding them doesn’t make it better.” Reading that statement, I can only feel that I have somehow failed to communicate something and I’m not really sure what, because it seems obvious to me why the premise of this statement is wrong. When you avoid performing a certain task, like releasing software, which costs some number of work hours, you can reallocate those work hours to other tasks. It’s not like the tasks of releasing and versioning simply stop happening, but you also get additional hours to accomplish other things which may be more valuable. So it’s never an issue of “simply avoiding” some task, at least on functional teams, the issue is choosing between alternatives.

And it should also be obvious that cutting discrete releases for internal dependencies is not an absolute requirement, but a choice that individual organizations make depending on how they see the tradeoffs or their particular culture.

There really are many different ways to develop software, and I’ve seen plenty of engineers get hired and completely fail to adapt to some methodology or culture that they’re not used to. The polyrepo approach with discrete releases cut with version numbers and changelogs is a very high visibility way of developing software and it works very well in the open source world, but for very good reasons many software companies choose not to adopt these practices internally. It’s very sad when I see otherwise talented engineers leave the team for reasons like this.


Too slow as in "to do it" or too slow as in "to use it". In either case I think if that were true there wouldn't be monorepo's at Google, Facebook, and Microsoft. I will say it's true that didn't come for free, e.g. Microsoft had to make GVFS due to the sheer enormity of their codebase but that's already done and works pretty well.

I agree share library style makes more sense in most cases though. The main problem with it is forcing everyone to use the latest library versions but that isn't insurmountable by any means.


My old boss was an engineering manager at Google in the 90s and early 2000s. He used to tell us that _everyone_ he interacted with at Google _hated_ the monorepo, and that Google’s in-house tooling did not actually produce anything approaching a sane developer experience. He used to laugh so cynically at stories or that big ACM article touting Google’s use of a monorepo (which was a historical unplanned accident based on toppling a poorly planned Perforce repository way back when), because in his mind, his experience with monorepos at Google was exactly why his engineering department (several hundred engineers) in my old company did not use a monorepo.


His experience from the 90s and early 2000s is meaningless in the current era. Version control and Google were in their infancy.

SVN was first released in 2000. Git in 2008. Branching, tagging and diffing were nowhere near what is possible now.

That goes back to desktop with a disk smaller than a GB, CPU in the tens of MHz with a network so slow and reliable, if you have one at all.


My understanding from many Google employees is that the properties of the system that caused problems in ~2000 - 2010 are largely still the same today: the canary node model of deployment, fixed small set of supported languages, code bloat, inability to delete code, bias towards feature toggles even when separate library dependency management would be better for the problem at hand, various firefighting when in-house monorepo tooling breaks, difficult on-boarding for people unfamiliar with that workflow, difficult recruiting for candidates who refuse to join if they have to work under the limits of a monorepo like that.


I work at one of the monorepo companies that you mention and there’s some truth to the “too slow” part. Although it’s it’s been a lot better lately (largely, due to the efforts of the internal version control dev teams), I’ve noticed at times in the past that you could do a ‘<insert vcs> pull’, go on a 15 minute break and it wouldn’t be done by the time you’re back.

Personally, I think there’s a place for mono repos and there’s a place for smaller independent repos. If a project is independent and decoupled from the rest of the tightly coupled code base (for instance things which get opesourced), it makes no sense to shove it into a huge monorepo.


I hate how these monorepo pieces gloss over the CI requirements. Just checkout the code that's affected by the change? Either you have a shared buid job that adds thoussands of builds a day & matching a commit to a build takes ages, or you have a plethora of jobs for each subrepo and Jenkins eats all the disk space with stale workspaces. And let's not talk about how to efficiently clone a large repo... our big repo took 5 minutes to clone from scratch, which killed our target time of 10 minutes from push to test results. We ran git mirrors on our build nodes to have fresh git objects to shallow/reference clone from to get it down to 30 seconds, and the whole system had to work perfectly or else hundreds of devs would be blocked waiting to see if their changes could be merged.


Last time I work at a massive Monorepo, half of my team was running got fetch as a cron job. It was an extremely painful experience


It would be quite remarkable if in-house corporate software, which face different constraints and challenges than open source software, turned out to nonetheless have exactly the same best practices.


The idea of using open source styled practices for internal development is not exactly new or remarkable. It's something people have been doing for a long time.

https://en.wikipedia.org/wiki/Inner_source




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: