Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For most reasonable quantities of data, diff speed is Good Enough as it is, and it's certainly not worth trading off accuracy for speed. In fact if anything we should be going the other direction - Patience Diff is not quite as fast as regular diff, but the output is much higher-quality in my experience, particularly with non-trivial amounts of diverge between codebases.

http://git.661346.n2.nabble.com/Bram-Cohen-speaks-up-about-p...



What if you have unreasonable quantities of data? I've as yet not come across a really good program that lets me do `bigdiff <(xzcat bigresult-old.xz) <(xzcat bigresult-new.xz)|less` (where the files are gigabytes of text with fairly few differences) in a reasonable amount of time/memory. I've used hacks that only work on a line-by-line basis (or use some hardcoded marker in the input) to try to read both files in parallel and run a real diff on a subsection when seeing a difference between the markers, but it's far from trivial getting it to work well (and I unfortunately don't have time to shave that yak :/)


I always have an old version of the source code of Solaris' bdiff with me (https://github.com/Arkanosis/Arkonf/blob/master/tools-src/bd...), just in case. It might have changed in the meantime in OpenIndiana / Illumos.

It was a very significant improvement in speed a few years ago — though with time I've gotten more RAM faster than bigger files to run diff on, and I haven't had any difficulty with the regular Linux diff for a long time.


Wow, zero memory usage and immediate output on files where GNU diff just sits there eating memory until everything is read! Thanks, that's fantastic.


As far as diff performance go concatenating the two large files using the diff replace all syntax is faster, uses o(1) memory, o(n) time and it's only slightly space inefficient.

I'd also say it might beat the OP alghorithm in performances under certain assumption (i.e large writes vs scan read performances)


Here's a port to C# that's hopefully more straightforward for people using other languages[1] (especially languages without slices).

> minimal logic

As the port demonstrates, Patience has very little logic - especially if you remove the optimizations.

[1]: https://github.com/jcdickinson/difflib/blob/master/DiffLib/P...


Does anyone know of a way to use Patience Diff with SVN?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: