Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've done some explorations here.

From the simple perspective, using a fixed width line in a prose markup language that is mostly whitespace agnostic like Markdown creates okay diffs in a line-based diff tool.

I did some work with tokenized diffs: https://github.com/WorldMaker/tokdiff

That particular tool/experiment uses the Pygments tokenizer used for syntax highlighting and produces interesting somewhat semantically meaningful code diffs. I think the same principles would apply if you used something like a part of speech tagger on prose.

From a different approach, I put some effort into better line-based and file-based diffs of Inform 7 which is a prose format that is less whitespace agnostic than Markdown and also built as a single monolithic file, by converting it to an intermediate format.

The code that does this is here: https://github.com/WorldMaker/APrincessOfMoons/blob/master/i...

That works by splitting the file at things that resembles headers, converting newlines to pilcrows (the paragraph symbol), and essentially reformatting to more of a fixed width format. (All of which is trivially reversible.)

You can see the commits in that repository as an example of what the intermediate format looks and diffs like.

That format handler is written as a plugin to a venerable tool I wrote called musdex: https://github.com/WorldMaker/musdex

I wrote that to better source control zip files as the contents of their zip rather than a binary blob, by letting the zip/unzip operations be automatable as a part of source control operations (pre- and post-commit hooks). This I've used for some of the modern file formats like .docx which are built as zip files of XML files and other assets. For instance, you want write the document in Word, interacting with the .docx, and source control its XML contents. That too gives more useful diffs than source controlling the .docx on its own.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: