Not bad, but I think it would be more useful if I could submit an image and have the engine give me all the facts it could dig up about it, based on its context in other pages, geo tags and camera type (if available), etc.
I think we're going to see some very interesting developments along these lines very soon. Scary stuff too. Imagine submitting a picture of yourself and finding out what the internet knows about you based on your physical appearance. Better keep those Facebook profiles private, folks! More than that, you'll have to convince your friends to keep their profiles private if they have pics of you as well!
<tangent> This is what is rather frightening about the next web; even if you want to remain anonymous, you're going to have to do battle with all the other folks who are more than happy to post and tag pictures of you for the world to see (with good -natured intentions, I might add). Remember that embarrassing moment at that party where you had a little too much to drink? Oh, you were too drunk to recall? Well, it's on somebody's public Facebook profile now. With your name on it. And if I am your employer, what's to stop me from taking your badge photo and plugging it into a service to pull down other pictures of you from the cloud? :O </tangent>
Anyway, back to the matter at hand! I do see your service as being particularly valuable to IP holders who want to know who is displaying their copyrighted images or logos without authorization. If your site were comprehensive enough, you could probably go freemium and become a paid tattle-tale. Take that a step further and "For a nominal fee, you can click here to have our partners at LegalZoom.com send a takedown notice."
You can use http://imageheader.com/alpha , which reads image file and displays all information about the image including geolocation tags. This is alpha version
Aka "opt-out". Which is why privacy is dead: who's going to be able to keep up with that, especially when other people have no problem tagging you if you know about the picture or not.
Assuming you're on the system they are doing the tagging on (it's easy when it's just Facebook, but a photo could be tagged on Flickr with your real name and you'd never know), and you check your email (or whatever method of notification they provide), and a host of other things that put more work on those who are trying to control their exposure. That's the problem with opt-out, and that's why thinking you can maintain your privacy by monitoring what other people do is folly.
If the image tag on Facebook isn't linked to your profile, there's no way for anyone to find it, unless they were friends with the poster and browsed to the photo.
Flickr is more public, but they don't do any user-graph linking at all, the photo annotations are plain text. I don't think the annotations are even exposed to google.
Old-school, low-tech, real-time, in-person approach: "Hi. How's it going?"
Your TinEye approach, later that day, via email: "Hi. How's it going? You don't know me, but I surreptitiously took your photo in the park earlier today, uploaded it to a reverse image search engine, and eventually uncovered your personal data after an exhaustive web search. Care to get some coffee sometime?"
Why don't you A/B test these two approaches to find out which works best?
Why "later that day", don't you have internet access from the phone? :-) I agree the old skool approach would work better for me, but the high-tech one is still a boon for stalkers and such.
Their crawler could use some work. (I uploaded my PR screenshot of my software. They flagged several copies which are domains 301 redirected to my homepage.)
The algorithm, however, is beyond awesome. They found half a dozen instances of my screenshot on the Internet, including my site, some download sites, and a Chinese pirate or two who had gone to the trouble of watermarking my image.
TinEye was created by Idée Inc. Idée develops advanced
image identification and visual search software for photo
wire agencies, stock photography firms, entertainment media
companies and some of the world's leading imaging firms
including Adobe Systems Inc.
In other words, yes they intend for it to be used by content owners to find unauthorized use of their IP on the web. On the other hand, they claim to respect robots.txt and give their crawler name on the same page.
I have an idea for how to do a better reverse image search engine. The problem with this one is that it only matches exact images. But what if it was possible to match against any feature within the source image, rather than the exact, entire source image you search for?
For each image in the index, break it up into 4x4 tiles, then store a hash code for each tile. Then repeat the process, but offset the boundary of each tile by 1px along the X axis. Repeat 2 more times. Then, for each offset along the X axis, offset down along the Y axis. So you store 16 hashes per 4x4 pixel area.
Now, when someone searches for an image, repeat that hashing algorithm for the source image. The results page then returns any image that contains a 4x4 tile that is also contained in the source image, ranked by the number of tiles within the image that is common between the source and result image.
The end result is that you can see how the features within an image are used in other images -- so if someone takes the red stapler from Office Space ( http://www.yunasville.com/img/102005/milton.jpg ) and puts it into a different image, and you search for that red stapler, the results page will still return the photoshopped image, because it'll match the 4x4 tiles on the stapler in both images.
I've explained this in a convoluted way, but hopefully I've communicated the essence of the idea.
On one hand, there will be more results to filter through, and it's more computationally expensive. But that's fine, the image results are still ranked effectively. On the other hand, it's more computationally expensive.
First, looking at the examples, it doesn't seem to be restricted to exact images, not by a long shot.
Second, your approach is hopelessly naive. ;) Consider: rescaling, re-encoding using a lossy image format, color adjustments, and so on.
Good news though, I bet your approach is way, way, _less_ computationally expensive than whatever it is they are doing ;)
(My guess would be something like SIFT or SURF, probably minus (some) rotation invariance to speed things up, combined with a whole bunch of hackery to make the feature database search suitably fast while still acceptably accurate. Worth your time to google + read up as those algorithms can achieve positive red stapler identification fairly robustly, but you'll be in for rather a lot of math.)
Somebody from google gave a talk at my college a few years back about image searching algorithms. He explained an algorithm that does more general matching than tineye and it seemed to work great. One of the demonstrations was a search for a starbucks logo that turned up a bunch of pictures that had a starbucks cup somewhere in them. I wish I could remember more details, but unfortunately a quick google search didn't turn up anything.
From the site after a 0 match result: "TinEye looks for the specific image you uploaded, not the content of the image. TinEye cannot identify people or objects in an image."
So it's not what some of us might have been afraid of.
Nor me when I found it a couple of months ago and I needed such a service. I wanted to source the true manufacturer of a product and all I was finding at the time were a lot of traders who were using the same official product photos. I think it will improve though and the web needs it.
I remember thinking about a service like this but for audio - for example you post a link to youtube video or some similar service and it would match audio in video with a song name. That would be pretty cool and useful.
As a photographer I must say that istockphoto and the non licensed (the proper term escapes me) images on flickr give bloggers and web designers no excuses.
Content owners pay a lot of money to protect their content. They pay money to lawyers, lobbyists, and companies like Media Sentry to snoop in on P2P traffic.
It's not a stretch to say they will pay to find stolen images. I'd venture they'd pay pretty well too.
I think we're going to see some very interesting developments along these lines very soon. Scary stuff too. Imagine submitting a picture of yourself and finding out what the internet knows about you based on your physical appearance. Better keep those Facebook profiles private, folks! More than that, you'll have to convince your friends to keep their profiles private if they have pics of you as well!
<tangent> This is what is rather frightening about the next web; even if you want to remain anonymous, you're going to have to do battle with all the other folks who are more than happy to post and tag pictures of you for the world to see (with good -natured intentions, I might add). Remember that embarrassing moment at that party where you had a little too much to drink? Oh, you were too drunk to recall? Well, it's on somebody's public Facebook profile now. With your name on it. And if I am your employer, what's to stop me from taking your badge photo and plugging it into a service to pull down other pictures of you from the cloud? :O </tangent>
Anyway, back to the matter at hand! I do see your service as being particularly valuable to IP holders who want to know who is displaying their copyrighted images or logos without authorization. If your site were comprehensive enough, you could probably go freemium and become a paid tattle-tale. Take that a step further and "For a nominal fee, you can click here to have our partners at LegalZoom.com send a takedown notice."
:)