Hacker News new | past | comments | ask | show | jobs | submit login

This 100%. It does get interesting when you get into non-plaintext things that have to somehow integrate into plaintext systems (git managed codebases). We've kind of left it up to CMS systems to handle the non-plaintext bits but this leads to many more orthogonal process problems.

IMO, I think it really comes down to finding a universal mechanism for diffing and 3-way merging things that aren't plain text (document diffing). I think distributed version control can be universal (at least on a data level), how an application renders a meaningful diff for a specific task is incredibly subjective to the document type and task at hand. My point being that I completely agree that plaintext makes a whole lot of sense for programmers and pretty much nobody else. However, distributed version control does not have to be confined to plaintext, it's just tricky to see when all the version control systems we're familiar with are plaintext ones.




Git is popular because it's linear, and the linear paradigm usually translates well to serial things such as programs, instructions, document sets, etc.

It's actually bad at non-linear stuff, which you will have noticed if you have ever been working with hierarchical formats, especially e.g. xml or nested JSON.

Word is bad for a whole litany of reasons, but the reason it can't be easily versioned (atop the format being a literal Goldberg machine requiring inane transforms to properly) is that it encodes a bunch of non-linear formatting instructions. Sure, we can sort-of reason about this stuff e.g. with a hierarchical css+html+js structure, but without a way to render that I challenge you to be able to simply diff that information. Seeing "bold" or "blue" seems simple enough, as long as you also know to which elements it applies and in what layout. So, suddenly you can't reasonably diff the css file without also difficulty the html.

For programmers, we are used to reducing things by their dimensions into fairly linear spaces, this then helps us reason fairly linearly about changes, but doing this from any other context is challenging. Lawyers e.g. perhaps focus on the relations between various clauses, so linearizing their document flow is not very important to them, at least when there exists methods to diff the general textual content without investing much in how they are doing that.

As programmers we see the similarities to editing a code base and that excites us, however we do have a tendency to go off and write frameworks to parse and simplify these things, without ever actually bothering to learn to apply these things. This is not invaluable, but it's a different focus, which maybe explains why lawyers are not in the habit of using git.


> Sure, we can sort-of reason about this stuff e.g. with a hierarchical css+html+js structure, but without a way to render that I challenge you to be able to simply diff that information. Seeing "bold" or "blue" seems simple enough, as long as you also know to which elements it applies and in what layout. So, suddenly you can't reasonably diff the css file without also difficulty the html.

We’re in complete agreement. But you can do this, you just need to provide a “renderer” and a schema that describes how your tree structure should merge or conflict. If you want to test out a weird version control for structured data, my email is in my bio.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: