I was digging into tutorials about Git internals recently and was surprised to learn just how straightforward the data structures are. Given how stupid-simple the data is, I'm all the more confused about how straightforward the CLI isn't.
There are very few if any ways to accumulate and amplify errors in the system. It's no wonder Linus created a working prototype so fast. It's the simplest thing that could possibly work. But this simplicity is also the reason why binaries occupy so much space.
A system can be so simple that there are obviously no errors, or so complex that no errors are obvious. In the middle ground we get progress by someone mathematically proving the soundness of a technique, and then a group of people working together to match the implementation verbatim to the proven technique.
Pijul seems to be stepping into that middle space, but it's not clear to me if we're going to follow, or if something else like it will get us moving. I do like the concept, but as someone else stated, it doesn't seem to be very lively right now.
I absolutely love, and recommend to any new team members, GitX [0] and other similar Git visualizers. It's incredibly valuable to be able to instantly see the Merkle Tree drawn out and say "oh, the reason my current HEAD isn't picking up thing X that I thought I'd merged in, is because thing X isn't visually an ancestor of my HEAD even though temporally it might have happened earlier than my most recent commit."
I see newer engineers struggling to memorize "what Git command corresponds to my current situation" all the time, and they're missing the intuition that it's all a very simple graph under the hood. Github, I think, does a disservice by trying to present commits as a linear list - while certainly it's easier to code a linear visualization, it makes people feel like Git is impenetrable magic, when it's anything but.
(Full disclosure: my love of visualizations of commit graphs may very much be influenced by the game Fringer [1], which was a formative part of my childhood!)
> I see newer engineers struggling to memorize "what Git command corresponds to my current situation" all the time, and they're missing the intuition that it's all a very simple graph under the hood.
The reason I have run into this has almost always been because what I want is “take this commit and move it here” and the command for it is some sort of git frobnicate --cached --preserve-stashed that you look up online. “It’s a graph” is great and takes like thirty seconds to explain but once you get that done with it provides almost no insight in how you’re actually supposed to interact with the porcelain to get that graph in the state you want it to be.
In the simple case, yes. But what if it’s a merge commit? What if there is a conflict and I actually want the result to split into two clean diffs that I’ll specify (so that both compile, of course). What if I want it inserted somewhere in the middle of a branch? What if the branch doesn’t exist yet but I want to do this to a remote branch without having to check out something new locally? What if I want to apply the change but lie about the author?
Git is able to do all these things, and I am actually quite pleased that it can support all of these strange workflows. But it still isn’t at all obvious how you’d get these to work if you know the operation to apply a commit was “git cherry-pick”. (I have also noticed that “git rebase” is often a, if not the answer to every “how do I fix my tree” question. But it’s certainly not advertised as such, which is beyond me.)
Right, if you want something complicated then it will be complicated, but saying that “take this commit and move it here” is complex is definitely false.
The point I’m trying to make is that I’m very opposed to people who go “the tree is so simple, if you understand it you’ll know how to use Git”. No, the tree is simple, the tools to work with it are not. None of the things I described have complicated end states, because in general you can’t tell how much work goes into getting to a particular from the final graph you get out of it.
not to be trite but the end result of this intuitive system seems to be to bolt on even more math and made up acronyms - both of which beginners really struggle with and most non college educated journeymen devs misunderstand coz of the esoteric nature of their education.
Strong agree for git rebase. When I write docs/teach folks about git, I almost never even mention merging. It's just generally very rarely useful, and I find it kind of makes git as a whole more of a black box. "git rebase -i" gives a much clearer picture of what's happening, and will let you solve significantly more complicated problems with the same tools and mental model.
TRIED: git log —-graph —-oneline —-decorate
GOT:
fatal: ambiguous argument '—-graph': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
Retype the dashes, you've posted something different to GP, em dash en dash, or something, presumably your browser/clipboard/term did something funky when you copied.
Yes, the visualizers are key. My experience is that every team needs approximately one Git guru who can run basic training sessions, dictate the workflow (branching/tagging/etc), and fix things when they go wrong. Otherwise you get stick with a bunch of people memorizing Git commands and creating some unusable history.
And while I say “Git”, it’s really the same situation for any VCS, in my experience. I think the underlying problem that a VCS solves is the truly complicated part here.
> And while I say “Git”, it’s really the same situation for any VCS, in my experience.
Sorry, but no. I have taught CVS, Subversion, and Mercurial to executives, artists and students. They have no problem with the mental model.
With git, people with a Master's in CS get screwed up.
Having "working", "staging" and "repository" concepts is the problem. Maybe "staging" makes Linus' life easier, but unfortunately git escaped to the common people and "staging" makes life miserable for the 99% of normal use cases.
Given the number of times I've had to go in and rescue someone using e.g. SVN or Hg, I can't say I've had the same experience.
The major problem is that as soon as you have a team of people with the same repo checked out, you have as many branches as you have people. These branches may not have explicit representations in the underlying VCS, but they exist just the same.
And so then you're dealing with scary "merge conflicts" for work that people have, from their perspective, already done but can't commit and push out.
Subversion is simple to understand because it is simple, and relatively incapable. If you only use git like you would use Subversion, it's simple too. Subversion is much less easy when you have to do something like merging a long-lived branch.
You conveniently omit the inclusion of mercurial in the list. Mercurial is as powerful as git is--in a few cases, arguably more so (phases make history rewriting safe!)--and yet there is pretty objectively far less confusion for newbies than git has.
There's ample evidence that git is unnecessarily complicated for the DVCS model it uses.
I found that the visualizers made it much more difficult to learn git for me. I couldn't make sense of what they were showing. It was just a bunch of lines of wildly different colors that made no sense. (I think part of it was that in most applications and when drawing things on paper, time goes left to right, but the visualizers always draw them top to bottom.)
> Github, I think, does a disservice by trying to present commits as a linear list - while certainly it's easier to code a linear visualization, it makes people feel like Git is impenetrable magic, when it's anything but.
I feel so alone every time I say this, so thanks.
If you give me half an hour (well, I suck at time estimates... a few hours, maybe) with someone, I can fix their thinking about this tool. But I don't have however many hours with the world.
On Ubuntu the older version of gitg is vety easy to read as well.
I like Sublime Merge but can’t for the life of me visually 7nderstand anything from the way it displays branches. gitg is wayy easier, particularlytbat it doesn’t mix a bunch of unrelated branches chronologically.
Like, i don’t care that there is a stack that branched from here or there, let me just see quickly what THIS branch i’m working on "grows" from.
This is not true. Which version of GitX are you using? There are a number of different forks, and the one I get from “brew cask install rowanj-gitx” works fine on Catalina. It is from 2014, but it’s code-signed and it’s 64-bit.
If you are using the older fork from http://gitx.frim.nl/ then it’s a 32-bit binary from 2009. That version won’t work on current macOS versions, but it will run on my Power Mac.
If you've ever used Darcs, that might help motivate why Pijul is interesting. Darcs was the first system I used with an interactive mode, sort of like `git add -i`. Obviously that's a UX-side change that can be (and has been) replicated in Git. But at the time it was fairly mind-blowing to work that way.
The other part is the theory of patches. Darcs took the lead here too, but the algorithms and implementation left something to be desired. So Pijul could be really cool if it finally gets this to really work.
On the other hand, if I'm being honest, all of my really hairy merge conflicts are not things that I think could be resolved by this---not without changing the way we think about code from being about text to being about ASTs. So I'm not sure if Pijul would have any practical day to day consequences for me. Certainly, when I moved from Darcs to Git, aside from UX issues, I don't think I noticed any major practical headaches due to the loss of the theory of patches.
You left out the thing that was awesome about darcs, that pijul could do too: real cherry pick,where picking a change also tracked down the other patches needed for that, you could literally pull whole features from one tree to another, not just a commit.
> On the other hand, if I'm being honest, all of my really hairy merge conflicts are not things that I think could be resolved by this---not without changing the way we think about code from being about text to being about ASTs.
With the trend towards automatic code formatting, I don’t think that would be difficult to do.
> The other part is the theory of patches. Darcs took the lead here too, but the algorithms and implementation left something to be desired.
For what it's worth, there's a Darcs 3 in development, with a new patch format/theory, thanks to the two keeping it alive. Find darcs 2 generally pleasant enough with a fairly large code base. I didn't understand the reason for not keeping the darcs interface with new guts for pjiul.
Yep. I’ve used darcs for at least a decade. The mental model is just so straightforward, I simply never wrestling with it. It does exactly what I want with very little thought. I’ve transitioned to git this last year and my head hurts constantly.
Things that used to be trivial are now unsolvable (by me at least).
The darcs ui was a complete joy. Interactive but super fast. Incredibly easy for new uses to learn.
Conflicts can't be avoided (certainly not by tree-diffing), and aren't an error state ("conflict" is a bad name because it sounds like it is). The useful innovation of pijul is that conflicted states are not an exceptional state - you can continue to apply patches.
I understand that conflicts are inevitable, but wouldn't tree-diffing at least be an improvement over line-diffing? I recall that Pijul's theory of generalized files (arbitrary digraphs of lines IIRC) is already fairly complicated though.
It wouldn't really make much of a difference, if any. For source files, anyway - more complicated diffing helps for files we consider just "binary" now but that are actually structured.
Pijul's pushouts are unrelated - that just allows a line in a file to be ambiguous, rather than definitely being one line.
You say Pijul's pushouts are unrelated, but their construction depends on a very line-centric definition of patches. Wouldn't it need to be made more complex to accomodate tree patches?
No? The concept is that a unit of diffing (a line, or a tree node in your hypothetical tree-diffing approach) can be ambiguous until another patch resolves the ambiguity.
In the vaguest sense, sure. But if a file is a list of lines, this "ambiguous file" is a digraph of lines. If a file is instead a tree of strings, what does an "ambiguous file" look like? See this paper which was the source of some of the main ideas of Pijul, and in particular, note that its extension to structured data is listed as "future work", which means it probably hasn't been done yet.
A tool that handles a frequent but not particularly challenging problem is still a net win. Humans make errors. The more times I have to do something manually, the higher the likelihood I have screwed one of them up. I don't expect to get better at doing a task the 101st time. But I do expect the odds that one of them gets cocked up to climb ever so slightly. Better if the machine can just do it.
If the majority of the code is written by middle-of-the-road team members, then most of the merges will be done by those same people. Something that never helps me with my changes still helps me, due to my shared responsibility for the project. This is an often overlooked aspect of the tool selection process.
The CLI is pretty straightforward. First, think of what you want to do, then do it. Consult only the man pages when performing the latter, never some random guy's blog or a confused Stack Overflow post.
Clearing the confused Stack Overflow posts &c. from your mental cache will make it all make sense.
I'm sorry, that is completely useless advice. The manpages randomly bumble around in the abstraction level of their "explanations" and rarely use words that the average person would think of first to describe something.
sorry i'm gonna sound like i'm making an argument for authority but why do man pages have to appeal to the "average" person anyways?
complex tasks require complex explanations to avoid weird hangups/errors.
every time i see people complain about the complexity of manpages i always wonder what their work looks like if manpages is the blocking issue to their understanding of a tool
I would agree that most manpages are pretty useful and that they don't necessarily need to be tutorials (it would be against Unix traditions, haha). Git's manpages and CLI are just a particularly bad word soup.
> A system can be so simple that there are obviously no errors, or so complex that no errors are obvious. In the middle ground we get progress by someone mathematically proving the soundness of a technique
As no-one else has commented: One might take the full Hoare quotation a different way, not referring to the simplest thing that could possibly work (and possibly not work well). "[T]here are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. [...]".
He also wrote somewhere -- which I can't now find -- about engineering in terms of producing an implementation that satisfies an initial predicate. In this context, perhaps he'd consider the difficult part to be a theory of the simple model of a set of patches as a design, with obviously (provably) no deficiencies in the required merge behaviour and simplicity in its use (c.f. git). Or perhaps he wouldn't, but a formal methods pioneer would presumably approve of a sound theory behind the implementation.
There are very few if any ways to accumulate and amplify errors in the system. It's no wonder Linus created a working prototype so fast. It's the simplest thing that could possibly work. But this simplicity is also the reason why binaries occupy so much space.
A system can be so simple that there are obviously no errors, or so complex that no errors are obvious. In the middle ground we get progress by someone mathematically proving the soundness of a technique, and then a group of people working together to match the implementation verbatim to the proven technique.
Pijul seems to be stepping into that middle space, but it's not clear to me if we're going to follow, or if something else like it will get us moving. I do like the concept, but as someone else stated, it doesn't seem to be very lively right now.