Hacker News new | past | comments | ask | show | jobs | submit login
Git submodules revisited (dev.to)
102 points by fanf2 on May 12, 2018 | hide | past | favorite | 55 comments



I'm unsure why most folks seem to get into a pickle over git submodules but I think it's essentially due to the mental model of how they think about the structure of the "root" project and it's relation to the submodules. This in turn seems to govern how they work with the root project and it's submodules.

I kind of think of the relationship as a pointer into what should be a entirely separately managed project. Changes to that project should happen in that project and only then should the "pointer" be modified. I usually go as far as checking out the submodule separately (to my "projects" directory) when I need to work on it, which of course entirely unnecessary, but for me helps keep that separation.

Another approach I've found that helps to combat issues some people have with them is to not have them littered throughout a project but have a very clear delineation between what are submodule and "root" project git managed files - ie. an appropriately named, usually top level, directory.

In practice I find them an extremely useful way of getting stuff done effectively and efficiently!


I agree. I use them extensively with Yocto and the different layers (submodules) that make up my operating system. People have an adversion to them, but I find them easy to work with and very useful.


Effective? Yes. Intuitive? Not so much so.

Your hacks are certainly worth exploring. But they are also solutions to why other people get pickled. Good for you. Not so for them.

I'd be curious to know if you use the CL or a UI-based git tool? It would seem to me - once you get to the necessity / complexity of something that entails submodules - seeing would be believing. Trying to juggle a detailed picture in your head, __and write good code__, __and get your git commands right__ is not exactly a recipe for success.

In many ways git is a great tool. But in the ways it is not expose its Achilles heel as we edge up on the second decade of the 21st century.

p.s. thx for sharing. I'm going to see about using your approaches.


git submodules is just /begging/ for tooling.

On the surface, the commands are fairly verbose. And (as the article pointed out) the documentation on it could use some love.

However the biggest problem I have encountered while using it, is that what code I actually have checked out is opaque without doing the song and dance of actually checking out each submodule & inspecting the dir. This sucks when evaluating the codebase without cloning e.g. in code review on GitLab/GitHub. And the commands/UX for updating them is just painful and error prone.

Really, the difference between putting something in a git submodule and putting something in a private npm/maven/etc. repo is that I can look at a file and read:

    my-dependency v1.1.5
And my human-brain can kind of know what that means.

Whereas if I look at a repo with a submodule, I just see a URL and a SHA. And while actually I think that this is a better model for keeping track of versions, it's terrible from a UX perspective.

There's also a whole host of tooling around your languages chosen artifact/package/dependency management that git submodules don't have yet. It's often supported by the same tools we use for compiling and other task running. Git submodules require another thing that lives outside of that ecosystem.

I kind of wish that a mainstream language would just adopt git submodules as part of their de facto package management strategy and build the tools on top of it we need to make it livable.


You can use tags for human readable names of commit hashes. But never rely on tags for security!


I don't think the tag actually is a property of the submodule steps itself (as in, it's not written in the .gitmodules), but yes you can manually check out a tag.


> I kind of wish that a mainstream language would just adopt git submodules as part of their de facto package management strategy and build the tools on top of it we need to make it livable.

Every language has a package management system already. Git should be made easier to use, not harder. I see the utility of submodules to power users, but they're not so necessary at all, but increase the complexity of git by adding yet another abstract concept.

Love you git, but you should be working on removing features, not adding them.


We're getting off on a tangent, but I actually think that dependency management makes way more sense to do in a VCS than some other tool provided by a specific language.

Look at the cluster that is dependency management in JavaScript, Go, etc. Every ecosystem is solving the same problem over and over, and languages can live and die by the quality and capabilities of their package management system.

It also sucks once you start building projects that are multi-language/ecosystem. It's why so many front-end technologies have just sucked it up and adopted NPM, because it has the most market share and continuing to bifurcate the package ecosystem is too cumbersome for consumers.

It would be much better to solve it /once/, with a tool that provides the lower level constructs. Then tools could be built on top of that to serve specific needs & improve the UX.

Maybe git submodules could be that? Whether it could or not, though, I do think that providing the ability to manage the versions of my dependencies in the same tool that I manage the versions of my own project... makes a lot of sense!


Whilst it's certainly a case of https://xkcd.com/927 I find that Nix is very good for what you describe; there's also Guix but I've yet to use it.

I think of them as "Make done right".


"it's terrible from a UX perspective."

Agree. The general concept is sound and needed but the execution / implementation is subpar.


I think git can easily claim the the award for terrible UX. It was intended for Linux Kernel development, not hobbyists or even medium size teams that need to put something together.


Absolutely. It's a great idea. It's a great tool. But it needs a serious makeover to the point that it's not exactly the right tool for modern software product dev / engineering.

Providing software solutions is difficult enough. Your tools shouldn't add friction.


"Git's submodules are so universally derided that there's practically an entire industry devoted to providing alternatives for managing dependencies." Stop right there and repeat after me:

Git is not a dependency manager, git is not a dependency manager, git is not a dependency manager, git is not a dependency manager.

Seriously, use the package manager provided by your build system. Some of them can point to git repos if you don't have a proper package registry, this is a better solution than plain submodules, for many reasons; Transitive dependencies, diamond dependencies, semantic versionion, can't forget to submodule update --recursive, etc, etc.


>Seriously, use the package manager provided by your build system.

Git is about source code. How would you use the package manager to manage source code?

The original author seems to he worried that some part of the system is not being updated to the newest available source before being built. I don't know of any system that would somehow automate that process.

The suggested submodule system might work as long as branch names don't change. But in real life they do change, old branches go out of maintenance. So in the end there must be a human making sure you don't miss anything.


> The original author seems to he worried that some part of the system is not being updated to the newest available source before being built. I don't know of any system that would somehow automate that process.

Subversion with svn:external does exactly this by default, and generally you only peg a revision in an external dependency when you tag for release.


> this is a better solution than plain submodules, for many reasons; Transitive dependencies, diamond dependencies, semantic versionion, can't forget to submodule update --recursive, etc, etc.

Well, one could argue that diamond dependencies are just bad design, transitive dependencies are just submodules of submodules, leading to conclusion that using submodules is like pinning entire dependency tree to a particular commit. So for a given commit there is only one way to download entire tree that is cryptographically secure, without trusting third parties.

I still wouldn't recommend submodules if your build system have packages but it's an interesting idea nonetheless.


If I depend on library A and B which both depend on the standard library, I have a diamond dependency. How is that bad design?


Standard library is usually an implicit dependency. Which package manager makes you explicitly name and version std lib?


Cabal is one example, which requires packages like `base` to be specified explicitly.

Cabal runs a dependency solver to satisfy most dependencies, but for compiler-provided things like `base` it simply checks whether or not the requirement is satisfied by the compiler being used (if not, it bails out).


I think C/C++ developers would love nothing more, but there is no standard package manager for those languages.


The worst thing about using git submodules is it's no longer a simple matter of `git clone url://to.repo` to clone a project. Then, if the person doing the clone is unfamiliar with submodules, it's unobvious how they go about fetching the submodules separately.

The next worse thing about submodules is when you add a submodule to a project, if that submodule has submodules of its own it's completely unobvious how to perform the submodule addition recursively. The `git add module` command doesn't recognize `--recursive`. IIRC the way you work around this is via the magic incantation `git submodule update --init --recursive` after adding the submodule having its own submodules.

I really like submodules conceptually but the current UX surrounding their implementation is awful.


You can clone with submodules in one command: https://stackoverflow.com/questions/3796927/how-to-git-clone...

But I agree, maybe it should be turned on by default.


try using https://github.com/ingydotnet/git-subrepo that gives you the ability to simply clone a repo.


The point is that git doesn't do the right thing.

A random git user with just the URL for my submodule-using repository isn't going to know to use some special thing to clone the repository. They're going to run `git clone URL` and then be frustrated by the results.


I am curious... has anyone ever tried to convince git to fix the defaults?

Of course there's a little more than just this wrong with submodules but it seems like given how much hate they get, that someone would be interested in actually fixing them


my point was, that if you use git-subrepo instead, then you get exactly that.


git already supports `git clone --recursive[-submodules] URL`, there's no need to use git-subrepo to achieve the recursive clone.

The problem is the requirement of prior knowledge about the repository's use of submodules. The plain clone won't even report any sort of feedback about the submodules being present and skipped, nor any hint as to how to retrieve them: `git submodule update --init --recursive`.

The presence of git-subrepo and `git clone --recursive` is of little consequence from the perspective of the many users who are now familiar with the ubiquitous `git clone URL`.

Does github tell people to use git-subrepo to clone a given repository? Hell, does github even tell people to add `--recursive` when a repository uses submodules? I haven't checked, but don't recall ever seeing it do so in the past.

These are not difficult technical issues, it's just the sad state of the submodules UX in git. I presume it will improve eventually.


with git-subrepo, you just clone as normal, you don't even need git-subrepo installed. It just looks like a normal repo. You require 0 prior knowledge. You can update all the code in all the subrepos as a normal repo. Never ever knowing it has subrepos. The only people who need to know and have git-subrepo installed are those that need to sync the subrepos.


Thank you for the clarification, sounds like it's worth taking a closer look.


Let's report it in Git's bug tracker ... oh wait.


not sure what your "oh wait" is about... they apparently take bug reports over email. from https://git-scm.com/community:

> Questions or comments for the Git community can be sent to the mailing list by using the email address git@vger.kernel.org. Bug reports for git should be sent to this mailing list.


That's not a bug tracker IMHO.


For the vast majority of users git is bug free. If there is an issue with git nowadays its highly specific and well suited to be handled in a mailing list.

This also makes it easier for the git devs to focus on actual development and not explain git to people unwilling to use google (the same people are to lazy to use email).


Git is already too horrendously complex. At this point, any feature whether worthy or not, has to be weighed against the learning curve required by new git users (already formidable).

Can't even count the number of times that we brought on a reasonably decent programmer, wrote decent working code, but that didn't have a clue about git, which resulted in the work being a mess of commits across multiple branches and forks.

To make git more powerful, the developers should make it easier to learn, not add power-user features.


I have yet to meet any programmer I’d consider “reasonably decent” that didn’t easily learn about branching, merging, and rebasing.

On the flip side every single person I’ve met that is either too stupid or too ignorant to learn the basics of using git also writes terrible code.


Totally disagree. The general flow of fork (sometimes?), branch, commit, merge, rebase, and/or squash is ridiculous. Eventually you'll also need to learn stashes, tags, remotes, merge requests, .gitignore, and who knows what. Most engineers just want to get their code into the main working repository. I use git every day, but it comes with so much baggage of vocabulary and ways things can go wrong. I get that git is powerful, it's just far too powerful for the vast majority of projects.

> every single person I’ve met that is either too stupid or too ignorant to learn the basics of using git also writes terrible code.

With git, a beginner programmer struggling to bang out some code now has to learn a whole additional system, just to save and share their code. Now, instead of teaching them on the language/product, you have to spend your time teaching them git. Of course experts know how to use git, but everyone else has to spend a week-plus learning a system that's not directly related to their job responsibilities.

It's somewhat similar to lawyer learning Microsoft word, an accountant with Excel, or teaching a draftsman how to hold a pencil... sure the good ones know how to do this already, but people would move so much faster if they could jump into things faster.


> "It's somewhat similar to lawyer learning Microsoft word, an accountant with Excel, or teaching a draftsman how to hold a pencil... sure the good ones know how to do this already, but people would move so much faster if they could jump into things faster."

Agree. However, the difference being __all__ those (each) have a universal defacto UI. Even when you write code, you see the result.

On the other hand, (CL-based) git requires you keep a bunch of extra stuff in your head. You have to know why questions (read: commands) to ask in order to "see" what's going on, etc.

We live in a UI / UX world. In that context (CL-based) git feels like a fax machine. Couldn't / shouldn't there be something better? If disruption is such a great thing, why does git get a free pass with "this is how we've always done it"?

The irony baffles me.


> shouldn't there be something better?

Mercurial?

> why does git get a free pass with "this is how we've always done it"?

I'd argue that the git monoculture is a much more recent phenomenon than we remember, and driven largely by the hegemonic status of GitHub in pop-dev culture. Now the possibility of using varied tools seems remote because everyone wants to be in the same place as everyone else.

Even the way we talk about teaching and onboarding new devs is couched in terms of GitHub, and GitHub alone.


> Mercurial?

It's not.

The basic feature set is pretty much the same, and requires similar effort to grok. Some things are much harder to do in Mercurial, and some things are just confusing (multiple heads, pushing branches, multiple branching models, the "tip", etc.). Online platforms are worse (Bitbucket vs Github) in practice.


Have you tried Fossil ?


The best alternative to submodules and subtrees I found is git-subrepo: https://github.com/ingydotnet/git-subrepo#benefits


I finally found a use case for git submodules: We have a git repo which provides a wrapper around compiling and packaging the Linux kernel for RISC-V, and we use a submodule to link to a specific release of Linux. (https://github.com/rwmjones/fedora-riscv-kernel)

However this also reveals the awkwardness of submodules:

* Everyone who clones the git repo either forgets or doesn't know that they have to do 'git submodule init' (and maybe update too? even I'm forgetting ...). So everyone asks us on IRC why it "doesn't work" and has to be told what to do. IMO git clone should also clone the submodules and set things up.

* You cannot add downstream patches to Linux this way. For a while we needed a small bug fix which wasn't in the Linux submodule (it's controlled by another company) so we had an awkward workaround in the build system to copy the whole submodule and apply the patch before building.

* The link is to a non-fast-forward branch of Linux, so the commit hash sometimes becomes invalid, which we don't notice until someone is trying to build it from scratch. You'd think that because both the module and submodule are hosted on github, that github wouldn't garbage collect the old commit hash, but that's apparently not how it works.

I don't know if there's some tool which solves this use case better, but git submodules are what we have.


>they have to do 'git submodule init' (and maybe update too? even I'm forgetting ...)

`git submodule update --init --recursive` to combine init and update and also recursively.


> * Everyone who clones the git repo either forgets or doesn't know that they have to do 'git submodule init' (and maybe update too? even I'm forgetting ...)...

‘git clone --recursive’ Add that to your README’s getting started section. Sounds a bit like you aren’t providing a good onramp for contributors.

> * You cannot add downstream patches to Linux this way...

> * The link is to a non-fast-forward branch of Linux, so the commit hash sometimes becomes invalid... You'd think that because both the module and submodule are hosted on github, that github wouldn't garbage collect the old commit hash, but that's apparently not how it works.

Commits will not get GCed if you have a ref pointing at them. Relying on commits that do not have a ref means they are not part of the official history of that project. You should fork linux and have a branch where you land your custom changes. You can track whatever upstream you want, third party or mainline linux, and rebase your changes on top of them.


Your CI system should build every commit from scratch, then you detect the problem right away. And don't rewrite history on branches which other things refer to, you are destroying the ability to check out old versions.

I don't understand why you cannot add your own patches though? Just need to push your fork somewhere available and then ref that in your submodule?


submodules always seem good in theory, but fall apart anytime they need to be updated on a regular basis as part of a dependency ecosystem. We tend to use them as shorter term solutions when we have something we know we want to be an “official artifact” eventually but don’t want to spend the overhead setting up a build/publish for that library.

In the sense of working with git submodules, it can be quite painful sometimes, as most HN readers here know. The process of installing and updating them, in my experience, does not follow the principle of least astonishment. When dealing with a project using multiple submodules and switching between different branches, it’s difficult to get the right combination of submodule dependencies on the first try.

submodules are a tool like any other and should be used as such. For us that usually means that we want to make a semantic separation between two different projects but in reality that only one project uses another for awhile and so it makes sense to have the less overhead of the submodule dependency vs “real” dependency.


Submodules simply ask you to explicitly set the revision of the module that you want to work with, especially when you need to upgrade them. That seems reasonable.


We have a situation at work where for security concerns, we want to grant a set of employees access to only a subset of a monolithic git repo. We investigated using subtrees, but it requires a little more manual intervention than we'd like.

So now I'm working on a project to write a single binary that can installed as a server-side git hook which would publish a subdirectory as its own repo, and then syncs commits and branches between the two.

Of course, it's not perfect. It won't be able to handle signed commits or tags. And the repository have to be on the same box (it uses symlinks, but that could in theory be worked around at the cost of a more complex setup script )

But it should be completely transparent to the repository users.


Check out copybara


Oh, that does look cool. I can see how that would be pretty useful, but it does look like it's a manual process that needs to be triggered, and not automatic. Still, it looks to be more widely applicable than what I'm building.


I use https://github.com/ingydotnet/git-subrepo I find that much nicer

means for most people using the repo they don't have to worry about the other repos.


My experience is that submodules are useful and unproblematic. cd into the submodule and treat it as a normal git repo is the easiest way. Then when you’re done, cd .. and commit the new submodule commit hash.


Git submodules is just some hack apenwarr banged together one day. It is what it is.

Same with Pip (the Python package installer thingy) BTW: it's just a hack somebody put together to scratch an itch.


I think that's git subtrees. Unless he did both?


D'oh! You're right. My bad.

It's too late to delete the comment. Eternal proof of my pitiful human fallibility.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: