Hey folks,
I've been building a rendering engine for a code editor the past couple of days. Rendering huge chunks of highlighted syntax can get laggy. It's not worth switching to React at this stage, so I wanted to just write a quick diff algorithm that would selectively update only changed lines.
I found this article:
https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/
With a link to this paper, the initial Git diff implementation:
http://www.xmailserver.org/diff2.pdf
I couldn't find the PDF to start with, but read "edit graph" and immediately thought — why don't I just use a hashtable to store lines from LEFT_TEXT and references to where they are, then iterate over RIGHT_TEXT and return matches one by one, also making sure that I keep track of the last match to prevent jumbling?
The algorithm I produced is only a few lines and seems accurate. It's O(N) time complexity, whereas the paper above gives a best case of O(ND) where D is minimum edit distance.
function lineDiff (left, right) {
left = left.split('\n');
right = right.split('\n');
let lookup = {};
// Store line numbers from LEFT in a lookup table
left.forEach(function (line, i) {
lookup[line] = lookup[line] || [];
lookup[line].push(i);
});
// Last line we matched
var minLine = -1;
return right.map(function (line) {
lookup[line] = lookup[line] || [];
var lineNumber = -1;
if (lookup[line].length) {
lineNumber = lookup[line].shift();
// Make sure we're looking ahead
if (lineNumber > minLine) {
minLine = lineNumber;
} else {
lineNumber = -1
}
}
return {
value: line,
from: lineNumber
};
});
}
RunKit link:
https://runkit.com/keithwhor/line-diff
What am I missing? I can't find other references to doing diffing like this. Everything just links back to that one paper.
Consider the following sequences (here each letter represents a line).
Where the first one is "left" and the second one is "right".The LCS is AAA. The minimal edit script from left to right is:
1. Insert B at beginning.
2. Delete B at end.
The proposed O(N) algorithm finds B as the first matching line, and thereby exhaust `left`. So it finds the common subsequence of just "B" and its edit script is:
1. Delete 3 As at beginning
2. Insert 3 As at end
which is longer.