Deep Learning Image Classifier

4qbomb · on July 29, 2014

LOL! http://i.imgur.com/Xs3GrGk.png

rfrey · on July 29, 2014

Not too bad! It was low probability, but it did somehow recognize Mike.

bhouston · on July 29, 2014

Didn't give me results at all to the three images I uploaded. Might be broken.

dkhar · on July 29, 2014

It looks like that's the case -- none of the example images on the front page work for me.

blobbers · on July 29, 2014

Same for me. Tried safari and chrome.

dhammack · on July 29, 2014

I think it's rescaling all images to fit the training size. If that is the case, then when your image has very different dimensions it gets distorted and confused. Try something with a height/width ratio like the samples.

joyofdata · on July 30, 2014

President Obama is recognized either as ...

... a mountain-bike / all-terrain-bike (http://cdn2.spiegel.de/images/image-730849-galleryV9-vuuv.jp...)

... or a rugby ball (http://cdn2.spiegel.de/images/image-730849-breitwandaufmache...)

... or a bullet proof vest (http://cdn2.spiegel.de/images/image-730849-thumb-vuuv.jpg)

I guess the implementation leaves room for improvement :)

phrixus · on July 30, 2014

There are several of these image classifiers now that someone should run an accuracy/speed/price comparison between

AlchemyAPI: http://www.alchemyapi.com/products/demo/alchemyvision/

UToronto

Rekognition: http://rekognition.com/demo/concept

Clarifai: http://www.clarifai.com/

I'm doing it myself, but I have a conflict of interest

sullivak · on July 30, 2014

Well there is the ImageNet challenge http://www.image-net.org/challenges/LSVRC/2013/results.php I'm not sure if Alchemy or Rekognition maps to any of those teams though.

Houshalter · on July 30, 2014

Also see https://rodrigob.github.io/are_we_there_yet/build/classifica...

tly_alex · on July 30, 2014

http://rekognition.com/demo/concept

Rekognition API has a similar API for all developers free.

It's reliable and very fast.

Checkout their demo page.

im3w1l · on July 30, 2014

I only tried one (hard) image, pizza-, sandwich- bloody mary https://imgur.com/30OgNdd. Rekognition seems to be working better than submission

Rekognition:

7.55% fruit; 0.92% dinner; 0.88% produce; 0.87% alcohol; 0.84% sliced

Toronto:

50% American lobster, Northern lobster; 12% plate; 7% crayfish, crawfish, crawdad; 7% Dungeness crab, Cancer magister; 4% king crab, Alaska crab; 4% butcher shop, meat market; 4% grocery store, grocery; 4% pomegranate

I find this interesting because I thought Hinton's group had state of the art tech. Who are these people and how do they do it?

mraison · on July 30, 2014

I think that when you're not expected to publish any papers to rationalize what you're doing, you're free to use any possible ugly hack to improve your results, (using a "kitchen sink" approach where you just combine the results of lots of unrelated techniques, extracting words from the URL, using the URL to actually fetch some related textual content on the website, etc). This gives private companies a competitive advantage over research institutions - their only purpose is to "make things work", not to introduce new techniques and have interesting insight about them.

tly_alex · on July 30, 2014

Lots of companies and teams are exploring deep neural network with all kinds of application. Rekognition API is the only one I found that provide open API service right now. You could train classifier using your own images. But you need to create an account and upload your images using their web application.

kephra · on July 29, 2014

tried two images:

http://kephra.de/Dampf/IMG_20140620_133839_800x600.jpg <- an ecigarette, and the classifier thought its a fountain pen. Well thats not bad, I got this joke/question from humans also.

http://kephra.de/pix/Snoopy/thump/IMG_20130822_135928_640x48... <- here it thought its a speed boat ... well my boat is fast, but not a speedboat, but an sailing boat. It offered several more boat types, but not just a plain sailing boat. Interesting here is that the last suggestion of only 1% could be considered right as "dock, dockage, docking facility"

Tried some other images from the lifestyle section of my homepage, but it looks as if the system newer saw a sewing machine before as it gives "Low recognition confidence", and no tags.

Estragon · on July 29, 2014

I can see how it could get speedboat from the shape of the hull.

frobozz · on July 30, 2014

It seems strange that they would include in their set of example images, a picture of the most famous mausoleum in the world, without it being tagged with mausoleum or tomb or anything like that.

PanMan · on July 30, 2014

And it is tagged 99% mosque, while it isn't one. (the building on the left of it, not in the image, is).

frobozz · on July 30, 2014

If I uploaded my own picture of the Taj Mahal and it told me it was a Mosque, I wouldn't be surprised, and I'd probably be reasonably impressed. The dome and minarets do rather give that impression, and I wouldn't really expect a computer to be able to tell the difference.

The reason I find it odd is that I would expect the first example on a demo to be carefully chosen to show off the system in the best light. It would be one that has perfect or near-perfect tagging. Maybe later on, I would show the shortcomings with a tricky image like this.

3rd3 · on July 29, 2014

Are there actually any image feature detectors and descriptors involved (like blob, edge and texture detectors) or is this solely based on artificial neural networks?

sjtrny · on July 29, 2014

Interestingly it has been shown that the result from some neural networks is equivalent to using classification with some predefined filters. These filters could be considered as a feature descriptor. See this talk from CVPR http://techtalks.tv/talks/plenary-talk-are-deep-networks-a-s....

chestervonwinch · on July 30, 2014

Thanks for sharing this. I enjoy Mallat's point of view. He has some similar talks on videolectures.net for anyone who's interested.

discardorama · on July 29, 2014

AFAIK, it's using a Deep Neural Network; which means, the inputs are, basically, pixel values (possibly normalized), and all feature detection, etc. is done in the layers of the network.

boomzilla · on July 29, 2014

yep, they try to learn an image's high level features by learning an autoencoder (that is a transform that takes an image and tries to produce the same image) via a sandglass shape multi layer network. Here is a very readable paper by Hinton himself that describes the approach:

http://www.cs.toronto.edu/~hinton/science.pdf

tiger10guy · on July 29, 2014

I'm pretty sure there's not an autoencoder involved, it just looks like a vanilla conv net.

This is the implementation: http://torontodeeplearning.github.io/convnet/

3rd3 · on July 29, 2014

Could it maybe be worthwhile to augment the data with simple image features? E.g. the human visual system is believed to rely on high-level/top down as well as on local/bottom up features (although that might also be simply because of the necessity to compress things for the low nerve count in the optical nerve).

agibsonccc · on July 29, 2014

A Deep Net (to be specific: a deep belief network which is a series of stacked RBMs, not Stacked Denoising AutoEncoders for clarification that there's a difference) usually can benefit from a moving window approach (slicing up an image in to chunks) to simulate a convolutional net. This can help a deep net generalize better.

That being said: even deep learning requires some sort of feature engineering at times (even if its pretty good with either hessian free training or pretraining).

The main thing with images is ensuring scaling them.

The trick with deep belief networks in particular is to make sure the RBMs have the right visible and hidden units (Hinton recommends Gaussian Visible, Rectified Linear Hidden).

Happy to answer other questions as well!

im3w1l · on July 29, 2014

I think it is a convolutional network trained only with gradient descent, since pressing source code links to convnet project.

mrfusion · on July 30, 2014

What data was it trained on?

Also can it tell you where in the image the identified object is?

miket · on July 30, 2014

This was pre-trained on ImageNet classes. You can find more information here: http://www.image-net.org

raverbashing · on July 29, 2014

My results (yeah, a tough image) http://imgur.com/pbH52xW

Crito · on July 30, 2014

From my experience with cats, "doormat" is actually pretty accurate. Damn things always dart right under my feet.