For most reasonable quantities of data, diff speed is Good Enough as it is, and ...

_pvxk · on March 29, 2017

What if you have unreasonable quantities of data? I've as yet not come across a really good program that lets me do `bigdiff <(xzcat bigresult-old.xz) <(xzcat bigresult-new.xz)|less` (where the files are gigabytes of text with fairly few differences) in a reasonable amount of time/memory. I've used hacks that only work on a line-by-line basis (or use some hardcoded marker in the input) to try to read both files in parallel and run a real diff on a subsection when seeing a difference between the markers, but it's far from trivial getting it to work well (and I unfortunately don't have time to shave that yak :/)

Arkanosis · on March 29, 2017

I always have an old version of the source code of Solaris' bdiff with me (https://github.com/Arkanosis/Arkonf/blob/master/tools-src/bd...), just in case. It might have changed in the meantime in OpenIndiana / Illumos.

It was a very significant improvement in speed a few years ago — though with time I've gotten more RAM faster than bigger files to run diff on, and I haven't had any difficulty with the regular Linux diff for a long time.

_pvxk · on March 30, 2017

Wow, zero memory usage and immediate output on files where GNU diff just sits there eating memory until everything is read! Thanks, that's fantastic.

avereveard · on March 29, 2017

As far as diff performance go concatenating the two large files using the diff replace all syntax is faster, uses o(1) memory, o(n) time and it's only slightly space inefficient.

I'd also say it might beat the OP alghorithm in performances under certain assumption (i.e large writes vs scan read performances)

zamalek · on March 29, 2017

Here's a port to C# that's hopefully more straightforward for people using other languages[1] (especially languages without slices).

> minimal logic

As the port demonstrates, Patience has very little logic - especially if you remove the optimizations.

[1]: https://github.com/jcdickinson/difflib/blob/master/DiffLib/P...

nix0n · on March 29, 2017

Does anyone know of a way to use Patience Diff with SVN?