But then I think you are giving up the main advantage of using mechanical turk over just roughly matching pixels. It is possible matching pixels will get the wrong answer where a human, with the context of being human, will know what the image is supposed to be and get the write answer, captcha style.
Example: an iron fence with a scenic view behind and where the slices are too small for someone to match only two.
What about doing the same thing, but fuzzing a few pixels from each edge to avoid pixel distance matching? You could even make the edges just plain black for a few pixels from each edge. The human eye would be able to complete the image (it would just look like it has black bars), but I'm guessing it would be enough to prevent a simple algorithmic solution.
Edit: This is essentially the same idea as the iron fence posted above.
Example: an iron fence with a scenic view behind and where the slices are too small for someone to match only two.