Hacker News new | past | comments | ask | show | jobs | submit login
A landmark 2012 paper transformed how software recognizes images (arstechnica.com)
110 points by eaguyhn on Jan 2, 2019 | hide | past | favorite | 29 comments



The paper is "ImageNet Classification with Deep Convolutional Neural Networks" by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, and is available here:

https://papers.nips.cc/paper/4824-imagenet-classification-wi...


I enjoyed genji256's comment on the article:

genji256:"Anecdote: I was one of the three reviewers for that paper and I tend to review harshly. A few years after it was published, I started worrying that I had given it a bad score and completely missed a field-changing paper. I frantically dug through my emails and found the review. Turns out I gave it a 7/10 so it wasn't THAT bad, though my summary makes me cringe a bit: 'A paper which, by giving precise details on the various tricks used, is a useful addition to the deep learning literature. I wish comparisons with other techniques were somewhat fairer.' "


I'm curious, are there papers in other ML fields that could be considered breakthroughs comparable in impact to AlexNet?

For NLP the recent ELMo and BERT papers for word embeddings come to mind, although their scope is somewhat different than AlexNet.


Sure, lots! We've made lots of progress just in the last decade or two. LSTM, recurrent, and convolutional nets. Slightly older (late 90's), but I think Random Forests were a pretty significant breakthrough.

Another huge one is Paul Graham's own work using Naive Bayes to filter spam.


Both convolutional and recurrent architectures were developed in late 80s. LSTM (improved RNN) was published in 1997.


Maybe LeNet from LeCun et. al? It was before my time, but the architecture is still influential. The more important paper would probably be Phoneme Recognition Using Time-Delay Neural Networks, which introduces Time-Delay NNs, from which ConvNets arise naturally.

Other architectures, like Sepp Hochreiter; Jürgen Schmidhuber (1997): "Long short-term memory", should also be mentioned.

Not sure which paper would be the most important for SVMs.

I don't know what to do about concepts that took a long time to develop, like backpropagation. Research is usually incremental.


Most research looks more incremental than people think. AlexNet is based on much work by Fukushima, Yann LeCun and others. Likewise, ELMO and BERT are based on 2015 work on pretrained language models:

https://papers.nips.cc/paper/5949-semi-supervised-sequence-l...


Man, I took a neural network class in uni and loved it. It was offered by the Psych department (however I was in Comp Sci). All I remember now is matlab labs, some of the terms, but otherwise nothing at all. My career took me nowhere near this subject matter and I've regrettably lost most recollections of it, so I appreciate this article explaining the basics again.


> Right now, I can open up Google Photos, type "beach," and see my photos from various beaches I've visited over the last decade. I never went through my photos and labeled them; instead, Google identifies beaches based on the contents of the photos themselves. This seemingly mundane feature

“Seemingly mundane”?? This is scary as hell.


It's "mundane" because it's not magic to look at a photo of a beach and determine that the subject is a beach. It's just difficult for a computer. There are rather magical bits of image processing software out there, automatically changing backgrounds, adjusting focus and lighting post exposure, and allowing us to pick the moment when the subject wasn't blinking, for example.


It’s hard to explain “from scratch”. A blind person can certainly perform the task. But, uh, how do you translate the image to something sensible?

Heck, I’d bet high school kids that have never seen the ocean could be fooled often with images that aren’t pristine sand and deep blue sea.

I have a low level representation that encompasses a bunch of sensory memory and imagination, and a high level representation “beach”. It takes a while for me to believe your low and high kinda match mine. I suspect it takes you a while to believe in me too.

It’s hard,partly because computers are dumb. It’s also hard because I can’t really communicate my low level understanding of beach.


It would be magic to have a technique whereby a blind person could classify images.


Scary how? If you send a photo to Google's computers, expect someone to look at it and classify it. If the expectation is that one can drown that possibility away using volume, then one would need to reassess their understanding of the purpose of computers.

What are your expectations, and why would the elicit such an extreme reaction to the status quo?


Google could take the label and create user profiles and then they are auctioned off on their massive ad exchange. You will then be hit with ads based on your photos, it could lead to some very scary or embarrassing moments depending on you photos perhaps?

This image classification can be done on client devices as well and backed up to the cloud with encryption. Expecting Google to do this is unreasonable.


Still not seeing what's scary there. If you don't want other people to see your photos, you shouldn't be sending (or uploading) them anywhere. If you don't want to see ads, don't use services that show ads.

Sure, image classification can be done anywhere. People could classify their own images without any algorithm. The point is, image classification isn't scary, and I don't see how it could be. Using classifications to manipulate people might be scary, but that has little to do with whether it is done by Google, the server, or the client. If you don't want your images classified, don't send them to photo stores whose entire reason for being is the organization and classification of data.


> Still not seeing what's scary there. If you don't want other people to see your photos, you shouldn't be sending (or uploading) them anywhere.

That's disingenuous: Google strongly and repeatedly encourages Android users to enable Google Photos for private backup with a single click. Users don't like to lose photos, and their phone vendor is telling them how to avoid it. This often isn't an informed or premeditated decision.


I always am puzzled by this type of argument. Surely it is not wrong for a company to advertise its own products and make it easy to sign up. It's on the user to be a critical consumer and it stands to reason that if you opt in to a service called "Company A Photos" that Company A might just have some access to your photos to provide services.


Sure, that's a problem, but a very different problem. I don't think I've ever seen a profit-focused company seriously give any weight to encouraging its customers to make good, informed decisions.


I knew they were doing this. I am just objecting to calling this “mundane”.


"[S]eemingly mundane". As in something you (a human) wouldn't need to put any thought or effort into. It doesn't actually occur to Muggles that there are things that they do that are actually hard problems for computers, even if they understand that there are all sorts of things computers can do in a flash that would exhaust their personal resources for weeks on end.


What's scary about this? I think it's pretty awesome and makes my photo collection a lot more useful to me.


Are you thinking they're talking about using Google Image search instead of Google Photos? Photos is all stuff you've explicitly uploaded/backed up to Google why would it be 'scary as hell' to be able to search those and have it work?


I am just objecting to calling these pratices “mundane”.


I mean you explicitly uploaded those to have Google archive and organize them for you. That's exactly what Google is doing here.

They didn't go out of their way to obtain your photos and run their categorization stuff on them.


I read it as trying to describe the act of image recognition, rather than the business practice of routinely applying it to users' private-ish data. The "seemingly mundane" part is because image recognition is so easy for people that we take it for granted, not that auto-scanning people's photos is or should be considered mundane. I agree it's not worded perfectly.


I'd go even further and say in the absence of any sharing of those tags etc or using it to advertise what's the issue with running them through a neural net like they do?


Apple photos does that — on device.


Secure Connection Failed.

What's going on arstechnica?


Seems like they've got some server problems and are working on it (https://twitter.com/arstechnica/status/1080502151363715072)

It's also gradually coming back together, one page resource at a time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: