Does this algorithm find the *shortest* edit script? Finding any diff is not rea...

mattb314 · on Oct 29, 2019

This is basically the issue. If a single line from the bottom of left is present at the top of right, it will fast-forward all the way through left and fail to match anything else from right. The hard part with diffing isn't finding differences, it's deciding whether they're inserts, edits, or deletions.

keithwhor · on Oct 29, 2019

Nope! That's related to the Longest Common Substring Problem I think, I mentioned it in another comment [0]. Though I have a feeling you could find the shortest edit script with a few easy tweaks, I might check it out. I imagine the tweaks are going to make the the difference between O(N) and O(ND) though.

The problem I had was that I just wanted a simple diff! It's for HTML rendering and not ultra complex at that: usually folks are only changing a few lines at a time, max, between any re-render.

[0] https://en.wikipedia.org/wiki/Longest_common_substring_probl...

dllthomas · on Oct 29, 2019

I think shortest is probably a good proxy if we can't find something better but I think in an editor "easiest to understand" is probably what's actually important.

Not that I have any particular expectation that this algorithm does unexpectedly well at that.

eru · on Oct 29, 2019

Yes. That reminds me of lossy image compression: the real goal is to find a representation that looks as close as possible to the input image to the human eye. In practice, we use proxies.

Not all proxies are equally good. Eg just going by Hamming distance or L2 norm distance would be awful. Proxies that take more features of our visual system into account are going to fare better.