Great write-up! We did something very similar when trying to find duplicate prod...

razius · on April 3, 2014

In practice we are using an image size of 17x16 which will result in a hash size of 256 bits and currently it seems to work pretty well. I ran the algorithm through the whole dataset (about 330.000+ icons) and I would say that from all the duplicate matches about 1% where false positives.

Also, we will be integrating this into the reviewing process for an iconset, where we also do a manual quality check, showing possible matches to something currently uploaded so skimming over one or two false positives isn't such a big deal and we where more interested in the speed of the algorithm.

szidev · on April 4, 2014

That's pretty impressive performance given the hash size and speed. Thanks for sharing!