Great write-up! We did something very similar when trying to find duplicate product images for a consumer review site we were working on. Our implementation desaturated the image, broke it into a fixed number of tiles, and generated a histogram of each tile's values. Then we put a threshold on the histogram values, and had each value in the tile represent a bit. Combine the bits, and we had a hash to store in the DB. Our hashes were a bit larger, so images within a certain hamming distance were flagged, rather than just looking for exact hash matches. It took quite a bit of tuning for us to get good results, but it seemed to work pretty well. Do you see many false positives with such a small processed image size (the 9x8 one, I mean)?
In practice we are using an image size of 17x16 which will result in a hash size of 256 bits and currently it seems to work pretty well. I ran the algorithm through the whole dataset (about 330.000+ icons) and I would say that from all the duplicate matches about 1% where false positives.
Also, we will be integrating this into the reviewing process for an iconset, where we also do a manual quality check, showing possible matches to something currently uploaded so skimming over one or two false positives isn't such a big deal and we where more interested in the speed of the algorithm.