Hacker News new | past | comments | ask | show | jobs | submit login

I was digging into tutorials about Git internals recently and was surprised to learn just how straightforward the data structures are. Given how stupid-simple the data is, I'm all the more confused about how straightforward the CLI isn't.

There are very few if any ways to accumulate and amplify errors in the system. It's no wonder Linus created a working prototype so fast. It's the simplest thing that could possibly work. But this simplicity is also the reason why binaries occupy so much space.

A system can be so simple that there are obviously no errors, or so complex that no errors are obvious. In the middle ground we get progress by someone mathematically proving the soundness of a technique, and then a group of people working together to match the implementation verbatim to the proven technique.

Pijul seems to be stepping into that middle space, but it's not clear to me if we're going to follow, or if something else like it will get us moving. I do like the concept, but as someone else stated, it doesn't seem to be very lively right now.




I absolutely love, and recommend to any new team members, GitX [0] and other similar Git visualizers. It's incredibly valuable to be able to instantly see the Merkle Tree drawn out and say "oh, the reason my current HEAD isn't picking up thing X that I thought I'd merged in, is because thing X isn't visually an ancestor of my HEAD even though temporally it might have happened earlier than my most recent commit."

I see newer engineers struggling to memorize "what Git command corresponds to my current situation" all the time, and they're missing the intuition that it's all a very simple graph under the hood. Github, I think, does a disservice by trying to present commits as a linear list - while certainly it's easier to code a linear visualization, it makes people feel like Git is impenetrable magic, when it's anything but.

(Full disclosure: my love of visualizations of commit graphs may very much be influenced by the game Fringer [1], which was a formative part of my childhood!)

[0] https://rowanj.github.io/gitx/

[1] https://www.youtube.com/watch?v=mAV7IioO_t8


> I see newer engineers struggling to memorize "what Git command corresponds to my current situation" all the time, and they're missing the intuition that it's all a very simple graph under the hood.

The reason I have run into this has almost always been because what I want is “take this commit and move it here” and the command for it is some sort of git frobnicate --cached --preserve-stashed that you look up online. “It’s a graph” is great and takes like thirty seconds to explain but once you get that done with it provides almost no insight in how you’re actually supposed to interact with the porcelain to get that graph in the state you want it to be.


> “take this commit and move it here”

Isn't that just `git cherry-pick $COMMITID`?


In the simple case, yes. But what if it’s a merge commit? What if there is a conflict and I actually want the result to split into two clean diffs that I’ll specify (so that both compile, of course). What if I want it inserted somewhere in the middle of a branch? What if the branch doesn’t exist yet but I want to do this to a remote branch without having to check out something new locally? What if I want to apply the change but lie about the author?

Git is able to do all these things, and I am actually quite pleased that it can support all of these strange workflows. But it still isn’t at all obvious how you’d get these to work if you know the operation to apply a commit was “git cherry-pick”. (I have also noticed that “git rebase” is often a, if not the answer to every “how do I fix my tree” question. But it’s certainly not advertised as such, which is beyond me.)


Right, if you want something complicated then it will be complicated, but saying that “take this commit and move it here” is complex is definitely false.


The point I’m trying to make is that I’m very opposed to people who go “the tree is so simple, if you understand it you’ll know how to use Git”. No, the tree is simple, the tools to work with it are not. None of the things I described have complicated end states, because in general you can’t tell how much work goes into getting to a particular from the final graph you get out of it.


not to be trite but the end result of this intuitive system seems to be to bolt on even more math and made up acronyms - both of which beginners really struggle with and most non college educated journeymen devs misunderstand coz of the esoteric nature of their education.

looking forward to

'merge patch graggle revert --flatten'

posts littering stack overflow in the future


Strong agree for git rebase. When I write docs/teach folks about git, I almost never even mention merging. It's just generally very rarely useful, and I find it kind of makes git as a whole more of a black box. "git rebase -i" gives a much clearer picture of what's happening, and will let you solve significantly more complicated problems with the same tools and mental model.


Have you tried the gitless porcelain?


I have not, although I took a look at some point. I really like staging areas :(


Another quick visualization method built into Git:

  git log —-graph —-oneline —-decorate

  # —-graph for the visualization
  # —-oneline for compact commits
  # —-decorate for tag and branch refs
You can also add:

  # —-all to include all branches
I typically add aliases to ~/.gitconfig to use these options by default.


TRIED: git log —-graph —-oneline —-decorate GOT: fatal: ambiguous argument '—-graph': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git <command> [<revision>...] -- [<file>...]'


Retype the dashes, you've posted something different to GP, em dash en dash, or something, presumably your browser/clipboard/term did something funky when you copied.


Should be:

    git log --graph --oneline --decorate


Thanks for correcting my error. Stupid “smart” dashes on my phone.


Yes, the visualizers are key. My experience is that every team needs approximately one Git guru who can run basic training sessions, dictate the workflow (branching/tagging/etc), and fix things when they go wrong. Otherwise you get stick with a bunch of people memorizing Git commands and creating some unusable history.

And while I say “Git”, it’s really the same situation for any VCS, in my experience. I think the underlying problem that a VCS solves is the truly complicated part here.


> And while I say “Git”, it’s really the same situation for any VCS, in my experience.

Sorry, but no. I have taught CVS, Subversion, and Mercurial to executives, artists and students. They have no problem with the mental model.

With git, people with a Master's in CS get screwed up.

Having "working", "staging" and "repository" concepts is the problem. Maybe "staging" makes Linus' life easier, but unfortunately git escaped to the common people and "staging" makes life miserable for the 99% of normal use cases.


Given the number of times I've had to go in and rescue someone using e.g. SVN or Hg, I can't say I've had the same experience.

The major problem is that as soon as you have a team of people with the same repo checked out, you have as many branches as you have people. These branches may not have explicit representations in the underlying VCS, but they exist just the same.

And so then you're dealing with scary "merge conflicts" for work that people have, from their perspective, already done but can't commit and push out.


Subversion is simple to understand because it is simple, and relatively incapable. If you only use git like you would use Subversion, it's simple too. Subversion is much less easy when you have to do something like merging a long-lived branch.


You conveniently omit the inclusion of mercurial in the list. Mercurial is as powerful as git is--in a few cases, arguably more so (phases make history rewriting safe!)--and yet there is pretty objectively far less confusion for newbies than git has.

There's ample evidence that git is unnecessarily complicated for the DVCS model it uses.


I found that the visualizers made it much more difficult to learn git for me. I couldn't make sense of what they were showing. It was just a bunch of lines of wildly different colors that made no sense. (I think part of it was that in most applications and when drawing things on paper, time goes left to right, but the visualizers always draw them top to bottom.)


> Github, I think, does a disservice by trying to present commits as a linear list - while certainly it's easier to code a linear visualization, it makes people feel like Git is impenetrable magic, when it's anything but.

I feel so alone every time I say this, so thanks.

If you give me half an hour (well, I suck at time estimates... a few hours, maybe) with someone, I can fix their thinking about this tool. But I don't have however many hours with the world.


On Ubuntu the older version of gitg is vety easy to read as well.

I like Sublime Merge but can’t for the life of me visually 7nderstand anything from the way it displays branches. gitg is wayy easier, particularlytbat it doesn’t mix a bunch of unrelated branches chronologically.

Like, i don’t care that there is a stack that branched from here or there, let me just see quickly what THIS branch i’m working on "grows" from.


> I absolutely love, and recommend to any new team members, GitX and other similar Git visualizers.

Is there a GUI for git blameall or similar functionality with clickable commits? http://1dan.org/git-blameall/


You have this in vscode I think...


I'd rather recommend Guitar.

https://github.com/soramimi/Guitar

Nice, simple and multi-platform.


Does anyone know a good visualizer for Linux? I find that half the time something goes wrong it's because I don't know the state of the graph.


gitk (comes with Git, part of the Git distribution, your OS packager may have split it out into some package like git-x11)


magit in emacs does a fairly good job,

    git --log --decorate --graph --all
I use all the time (aliased everything except --all as gl)

Also, tig is a nice standalone implementation of a magit like tool.


https://github.com/soramimi/Guitar

Nice, simple and multi-platform.


Gitg. Similar to gitk but with a nicer GTK GUI.


GitAhead works for me


Just an FYI that GitX won't work on an up to date Mac


This is not true. Which version of GitX are you using? There are a number of different forks, and the one I get from “brew cask install rowanj-gitx” works fine on Catalina. It is from 2014, but it’s code-signed and it’s 64-bit.

If you are using the older fork from http://gitx.frim.nl/ then it’s a 32-bit binary from 2009. That version won’t work on current macOS versions, but it will run on my Power Mac.


Yep, rowanj is the one that is up to date, but to be fair it is not the first result on Google.


Here's the fork which I use:

https://rowanj.github.io/gitx/ (`brew cask install rowanj-gitx`)

The upstream also has some signs of life but I don't believe it has a stable release yet:

https://github.com/gitx/gitx


If you've ever used Darcs, that might help motivate why Pijul is interesting. Darcs was the first system I used with an interactive mode, sort of like `git add -i`. Obviously that's a UX-side change that can be (and has been) replicated in Git. But at the time it was fairly mind-blowing to work that way.

The other part is the theory of patches. Darcs took the lead here too, but the algorithms and implementation left something to be desired. So Pijul could be really cool if it finally gets this to really work.

On the other hand, if I'm being honest, all of my really hairy merge conflicts are not things that I think could be resolved by this---not without changing the way we think about code from being about text to being about ASTs. So I'm not sure if Pijul would have any practical day to day consequences for me. Certainly, when I moved from Darcs to Git, aside from UX issues, I don't think I noticed any major practical headaches due to the loss of the theory of patches.


You left out the thing that was awesome about darcs, that pijul could do too: real cherry pick,where picking a change also tracked down the other patches needed for that, you could literally pull whole features from one tree to another, not just a commit.

It was marvelous with darcs.


Isn't that describing `git merge <commit>`? (The whole point of cherry-pick being that you specifically DON'T want anything but the one commit)

Edit: Never mind, I see that darcs patches are not equivalent to git commits (and maybe not to anything in git).


> On the other hand, if I'm being honest, all of my really hairy merge conflicts are not things that I think could be resolved by this---not without changing the way we think about code from being about text to being about ASTs.

With the trend towards automatic code formatting, I don’t think that would be difficult to do.


I think this is where things could go—VCS aware of semantics—which could happen if syntax==semantics.


> The other part is the theory of patches. Darcs took the lead here too, but the algorithms and implementation left something to be desired.

For what it's worth, there's a Darcs 3 in development, with a new patch format/theory, thanks to the two keeping it alive. Find darcs 2 generally pleasant enough with a fairly large code base. I didn't understand the reason for not keeping the darcs interface with new guts for pjiul.


Yep. I’ve used darcs for at least a decade. The mental model is just so straightforward, I simply never wrestling with it. It does exactly what I want with very little thought. I’ve transitioned to git this last year and my head hurts constantly.

Things that used to be trivial are now unsolvable (by me at least).

The darcs ui was a complete joy. Interactive but super fast. Incredibly easy for new uses to learn.


Conflicts can't be avoided (certainly not by tree-diffing), and aren't an error state ("conflict" is a bad name because it sounds like it is). The useful innovation of pijul is that conflicted states are not an exceptional state - you can continue to apply patches.


I understand that conflicts are inevitable, but wouldn't tree-diffing at least be an improvement over line-diffing? I recall that Pijul's theory of generalized files (arbitrary digraphs of lines IIRC) is already fairly complicated though.


It wouldn't really make much of a difference, if any. For source files, anyway - more complicated diffing helps for files we consider just "binary" now but that are actually structured.

Pijul's pushouts are unrelated - that just allows a line in a file to be ambiguous, rather than definitely being one line.


You say Pijul's pushouts are unrelated, but their construction depends on a very line-centric definition of patches. Wouldn't it need to be made more complex to accomodate tree patches?


No? The concept is that a unit of diffing (a line, or a tree node in your hypothetical tree-diffing approach) can be ambiguous until another patch resolves the ambiguity.


In the vaguest sense, sure. But if a file is a list of lines, this "ambiguous file" is a digraph of lines. If a file is instead a tree of strings, what does an "ambiguous file" look like? See this paper which was the source of some of the main ideas of Pijul, and in particular, note that its extension to structured data is listed as "future work", which means it probably hasn't been done yet.

https://arxiv.org/pdf/1311.3903.pdf


I don't think you have understood the pushouts paper. Look at figure 4 again.


Which one's figure 4? They're not numbered.

Also, can you point out what my misunderstanding is? That'd help a lot.


A tool that handles a frequent but not particularly challenging problem is still a net win. Humans make errors. The more times I have to do something manually, the higher the likelihood I have screwed one of them up. I don't expect to get better at doing a task the 101st time. But I do expect the odds that one of them gets cocked up to climb ever so slightly. Better if the machine can just do it.

If the majority of the code is written by middle-of-the-road team members, then most of the merges will be done by those same people. Something that never helps me with my changes still helps me, due to my shared responsibility for the project. This is an often overlooked aspect of the tool selection process.


The CLI is pretty straightforward. First, think of what you want to do, then do it. Consult only the man pages when performing the latter, never some random guy's blog or a confused Stack Overflow post.

Clearing the confused Stack Overflow posts &c. from your mental cache will make it all make sense.


I'm sorry, that is completely useless advice. The manpages randomly bumble around in the abstraction level of their "explanations" and rarely use words that the average person would think of first to describe something.


I can't really reply to this; it just isn't true, and the text is right there for anyone to read if they want to see for themselves.


This is so "true" that there's even a realistic generator:

https://git-man-page-generator.lokaltog.net/


Ha! Best laugh I've had in a month!


sorry i'm gonna sound like i'm making an argument for authority but why do man pages have to appeal to the "average" person anyways? complex tasks require complex explanations to avoid weird hangups/errors.

every time i see people complain about the complexity of manpages i always wonder what their work looks like if manpages is the blocking issue to their understanding of a tool


I would agree that most manpages are pretty useful and that they don't necessarily need to be tutorials (it would be against Unix traditions, haha). Git's manpages and CLI are just a particularly bad word soup.


TODO: man page search engine with semantic term substitution (e.g. Bert). Possibly trained on stackoverfow.


> A system can be so simple that there are obviously no errors, or so complex that no errors are obvious. In the middle ground we get progress by someone mathematically proving the soundness of a technique

As no-one else has commented: One might take the full Hoare quotation a different way, not referring to the simplest thing that could possibly work (and possibly not work well). "[T]here are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. [...]".

He also wrote somewhere -- which I can't now find -- about engineering in terms of producing an implementation that satisfies an initial predicate. In this context, perhaps he'd consider the difficult part to be a theory of the simple model of a set of patches as a design, with obviously (provably) no deficiencies in the required merge behaviour and simplicity in its use (c.f. git). Or perhaps he wouldn't, but a formal methods pioneer would presumably approve of a sound theory behind the implementation.


"A system can be so simple that there are obviously no errors, or so complex that no errors are obvious."

I was gonna compliment you on this gem, but it sounded familiar, and, sure enough -- paraphrased Tony Hoare. Great comment, regardless.


> But this simplicity is also the reason why binaries occupy so much space

Don't most repos now use Git Large File Storage (LFS) to prevent binaries taking so much space?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: