Very nice paper. I am actually working on a project in a similar space using very similar techniques, though emphasis is on speed of matching and retrieval as well as matching the objects in the image rather than the image as a whole. Little early demo here http://www.youtube.com/watch?v=h3YldXhG3Qc&feature=chann...
That's just brilliant. It reminds me a bit of Microsoft's Photosynth.
I think this technique provides a lot of possibilities for future consumers. Maybe in the future you could cross-match your photos with an online database and auto-adjust the color and light balance accordingly.
E.g. Let's say you went to visit Machu Picchu on a very cloudy and rainy day. You come home and realize your photos look terrible. You put your photos in a piece of software and match them with an online set of Machu Picchu photos. You click on "auto-adjust" and hey presto, they are transformed in to a set of photos that look perfectly lit and balanced. Or am I just dreaming out loud.
Pretty cool. I'm not super familiar with machine learning but from reading section 2 of their paper it sounds like you need to manually find a matching picture to the one you care about (the single positive) and a bunch of pictures that don't match. Then you run the algorithm and come up with a set of weights.
It would be nice if things were more automatic, like if a computer program could decide what features were unique (maybe also through machine learning it could learn that buildings are generally unique and the sky is not).
I haven't read the paper in full yet (nor am I a machine learning expert), but it seems to be training it against a precompiled dataset, not using human views:
"To learn the feature weight vector which best discriminates an im- age from a large “background” dataset, we employ the linear Sup- port Vector Machine (SVM) framework. We set up the learning problem following [Malisiewicz et al. 2011] which has demon- strated that a linear SVM can generalize even with a single positive example, provided that a very large amount of negative data is avail- able to “constrain the solution”. However, whereas in [Malisiewicz et al. 2011] the negatives are guaranteed not to be members of the positive class (that is why they are called negatives), here this is not the case. The “negatives” are just a dataset of images randomly sampled from a large Flickr collection, and there is no guarantee that some of them might not be very similar to the “positive” query image. Interestingly, in practice, this does not seem to hurt the SVM, suggesting that this is yet another new application where the SVM formalism can be successfully applied."
Haha, yes, but I mean it seems to imply it requires a starting pair; one additional photo selected by the human as a match in addition to the one we desire to find matches for.
Very cool. Figure 7 is particularly instructive as to the power of this approach.
It would also be interesting to see a comparison of images where the features are less pronounced, e.g., matching landscape images. One can imagine uniform weighting approaches performing better where, for example, color is a more important matching criteria than form/features.