What is the best and right way to open-source packages from a company monorepo?

raziel2p · on June 1, 2020

why does the package you want to split out need to remain part of the monorepo? in my mind, if you open-source packages, you should treat them just as you would external packages that aren't maintained by you - either install them with your langauge's package manager, use git submodules, or add init scripts that git clone them into the correct path (git-subrepo seems like it's facilitating this).

> Even if there are two public monorepos out in the open you can have similar problems trying to collaborate, because to modify one line of a package, you may need to pull a huge monorepo and its tooling down.

in my experience this has never really been a problem.

compsciphd · on June 1, 2020

what he said. The question is what are you trying to accomplish. are you just trying to make code dumps (i.e. no real development is done on the OSS side, just you'll dump code there that others can use) or is development supposed to take place on the OSS side.

if the former, its not really a Q, just figure out how to dump something that can be built independently of your mono repo and provide dumps.

If the latter, separate it out from your mono repo and treat it as any other external dependency you import it into your mono-repo. this obviously takes more work managing the new external repo, cutting releases and the like, but is more valuable to the community at large.

vaughan · on June 1, 2020

The goal is to allow development to happen on the OSS side. Unless you are a Google or Facebook with the ability to properly resource OSS, the 1st approach just leads to abandoned projects.

> this obviously takes more work managing the new external repo, cutting releases and the like, but is more valuable to the community at large.

I guess to gist of my question is how to keep the benefits of a monorepo while allowing a OSS development workflow.

It feels like OSS works best when there is a single Github repo that does not depend on a ton of other dependencies that change often.

ssivark · on June 1, 2020

I’m very confused. Open sourcing makes it an independent project (might as well be external), and that’s necessary for the project to be of use to anyone else (Imagine one of your dependencies being part of someone else’s monorepo). In case that’s successful, you can handle it just like any other library dependency. (How do you do that in your monorepo?)

To elaborate... How can the community participate in the project if it is not a relatively independent project? If things are closely coupled to one person’s monorepo, then presumably the code is not usable for another person. So, for practical purposes, it might as well be just a code dump.

compsciphd · on June 1, 2020

as we said, presumably your mono repo imports external dependencies, right? its not all developed internally.

this just becomes another external dependency to be managed, albeit one you have more control of its actual release schedule

if this item depends on many other things that are also internal projects that you don't want to release, you will have to abstract them out and make the implementations pluggable with a working implementation that is part of the OSS side.

vaughan · on June 1, 2020

> why does the package you want to split out need to remain part of the monorepo

It becomes more difficult to maintain and you lose the advantages of a monorepo (atomic commits/branching, etc.)

Submodules doesn't work well with branching, and publishing to package manager now means you need to symlink.

It becomes quite painful quickly if the open-sourced dependency changes a lot.

baslas · on June 1, 2020

I have never used this tool but Google open-sourced Copybara[1]. If you look at the commits of their other open-source repositories, it seems this tool is often used.

[1] https://github.com/google/copybara

vaughan · on June 1, 2020

Thanks, hadn't heard of this one.

Looks a bit heavy though for my liking but maybe they have some good ideas.

> Copybara requires you to choose one of the repositories to be the authoritative repository

Looks like it is more angled at mirroring private to public.

mbrukman · on June 1, 2020

Disclosure: I work at Google, but not on Copybara.

I've seen Copybara used at Google in both directions: for some projects, the internal repo is the authoritative one, and for others, the external repo is authoritative.

Copybara is not prescriptive, you can go in either direction.

Someone · on June 1, 2020

Named after https://en.wikipedia.org/wiki/Capybara, I guess?

steeve · on June 1, 2020

Came here to say that

quicklime · on June 1, 2020

It seems like the HN consensus from the existing comments is to make the source of truth the public repo, and import that repo into the monorepo build somehow. This works in a lot of cases, but it does come with some drawbacks. Basically you will lose a lot of the benefits of a monorepo:

- You can't make atomic commits across the open source repo and the internal monorepo.

- Changes to the open source project won't automatically trigger internal integration tests.

- Your coworkers can no longer just run `bazel build` or `bazel test` on your project anymore, so there's relatively large amount of friction for them before they can make changes.

I don't think there's a simple answer to this question yet, but a few things to consider based on my experience with this:

- If you expect contributions to mostly come from internal developers, then maybe lean towards keeping it in the monorepo, but if you think contributions will come from external developers, lean towards an external repo.

- If it's going to be a mix of both, it's going to be difficult, so make sure you regularly sync the two repos, otherwise you're going to have to spend a lot of time resolving conflicts.

- Think about what build system you're going to use (usually it'll be something like bazel or buck inside the monorepo, but some people prefer language-specific ones for open source repos, e.g. cmake, gradle, yarn/npm). If you decide to use separate build systems internally and externally, make sure both have CI systems in place that will catch build errors.

ssivark · on June 1, 2020

How can the community participate in the project if it is not a relatively independent project? If things are closely coupled to one person’s monorepo, then presumably the code is not usable for another person. So, for practical purposes, it might as well be just a code dump.

temikus · on June 1, 2020

Google’s internal OpenSource releasing policy is actually public: https://opensource.google/

It’s written from a more process/legal viewpoint but you might be able to pick up some ideas in there.

antoncohen · on June 1, 2020

A couple specific quotes that are relevant:

> If you are planning to regularly mirror the source from Google internal repos to public ones (or vice versa) Copybara provides workflows and tools for this.

https://opensource.google/docs/releasing/preparing/#tools

> Google-owned open source projects must be moved to third_party prior to being released under an open source license, even if Google owns 100% of the code, because the projects are expected to receive external contributions.

https://opensource.google/docs/thirdparty/

staktrace · on June 1, 2020

We have this problem at Mozilla. The Firefox codebase is a monorepo and source of truth for the WebRender project which lives in a subtree. The source of truth used to be a separate github repo and it was synced into the monorepo regularly but we flipped that around and now we sync back out to the github repo. PRs do come in against the github repo and we have a bit of ad-hoc tooling that imports the PR into our monorepo's code submission flow (bugzilla+phabricator). It lands in the monorepo assuming it passes review and tests, and then gets synced back to the github repo.

That being said the tooling we have is not great - there's still some manual steps in this flow, and it's not ideal.

For the webgpu project that's in a similar situation but gets a lot more external contributions, we will eventually try some sort of two-way sync.

So sorry I don't have a good solution for you but you're not alone with respect to this problem :)

vaughan · on June 2, 2020

Thanks for the insight. Good to know I'm not alone.

It feels orders of magnitude easier to make the internal repo the source of truth when developing and a huge sacrifice to productivity to split something out.

If there was a good solution I think it could really help allowing more open source company sponsored projects to get out there in the wild.

svrtknst · on June 2, 2020

Wouldn't it be possible to flip things around, e.g. source of truth is a standalone public repo, that gets forked into the monorepo? So you could thne make changes internally, and then afterwards isolate the library changes and push them upstream

idk how monorepos work really so maybe im way wrong

cuspycode · on June 1, 2020

I've had good results from using `git filter-branch --index-filter` to keep everything related to the exported package and remove everything else. This keeps commit history in such a way that `git log --follow` still shows older commits in the new repo, which is nice. I could never figure out how to make that work with `git subtree`.

zackbrown · on June 1, 2020

`git subtree split` will extract a subdirectory and its (exclusive) history to a new repo.

Publish that repo and maintain it separately.

wallstprog · on June 1, 2020

We have the same issue, except in both directions -- we use open-source projects as part of our software, and have also open-sourced a component that we developed, so we need to be able to go both ways.

The following seems to be working reasonably well so far:

- Each component has its own internal repo (in our case, we happen to use BitBucket). This repo contains any bits that are either proprietary and/or specific to our environment, like build scripts, integration with internal CI, etc.

- The open-source part of the component is hosted in an external repo on GitHub.

- The internal repo includes the contents of the external repo using git submodules (https://git-scm.com/docs/gitsubmodules).

We use a convention re: branch names:

- For projects that we don't own, "staging" is where our changes go before being submitted upstream as PR's.

- For projects that we do own, "staging" represents the current development "HEAD" -- our internal "master" branch includes the "staging" branch from the external repo.

- In either case, once a change has been approved for production use, it is committed to a release branch that is used to drive production builds.

Git submodules have a couple of features that make this easier:

- Each branch in the internal repo has its own .gitmodules file, which identifies which branch to fetch from the external repo. The .gitmodules file is an ordinary text file, and is managed just like any other file in the repo.

- Updating the submodule code in the internal repo is done by recording the hash of the specific commit from the external repo. This lets us control precisely which code from the external repo is included in builds (either development or production builds), and also provides an audit trail.

We used subtrees initially, and if there's not much traffic back and forth between the internal and external repos that can work, but it breaks down quickly as the repo becomes more active.

vaughan · on June 2, 2020

The downside of submodules is you lose the ability to easily branch across all packages/repos and then just commit.

You could branch the other project but then you have to coordinate this yourself. There is a project called Meta that can help but to me feels like it can quickly get out of control.

Do you find this a problem?

quantummkv · on June 1, 2020

The best approach would to extract the package out into a separate repository and using that as a regular package in your monorepo. This would solve all your problems

timini · on June 1, 2020

+1 git subrepo