Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It does this for 20,000 different objects categories - this is getting close to matching human visual ability (and there are huge societal implications if computer reach that standard).

This is the most powerful AI experiment yet conducted (publicly known).



"It does this for 20,000 different objects categories - this is getting close to matching human visual ability"

No, it isn't. This classifier cannot identify theme variations, unknown rotations, will confuse new objects for objects it already knows, is unable to cope with camera distortion, needs fixed lighting, has no capacity for weather, does not work in the time you need to run away from a tiger, requires hundreds of times more data than a human eye presents, and does a far lower quality job, all while completely losing the ability to give a non-boolean response.

To say this is approaching human abilities is to have no idea what human abilities actually are.

"This is the most powerful AI experiment yet conducted"

No, it isn't. Please stop presenting your guesses as facts. Cyc runs circles around this, as do quite a few things from the Netflix challenge, as well as dozens of other things.

I personally have run far larger unsupervised neural networks than this, and I am not a cutting edge researcher.


I'm not a Machine Learning / AI expert, so I have to ask: if running a neural network on 16,000 cores with a training set of 10 million objects isn't cutting edge research -- and if running "far larger" networks than this, as you say you have, also isn't cutting edge research -- then please tell me: what is cutting edge research?

I ask this question in all seriousness; I'd really like to know.

(And yes, I see that your username is that of a noted AI researcher. Who died in 2010. So if you're actually his beta simulation, then I'll indeed be rather impressed...)


I'm his son.

Let's take the example of The Netflix Prize, a $1 million bounty that the movie shipping organization ran several years ago. Their purpose was to improve their ratings prediction algorithm, under the pretext that people frequently ran out of ideas of what to rent, and that a successful suggestion algorithm would keep people as customers longer after that point.

So, they carefully defined the success rate of their algorithm - that is, make it predict some set of actually-rated movies X on a 1-5 half-integer scale, take the arithmetic mean of (the sum of (the square of each error from the real rating)) - which we'll call root mean square error, or RMSE - and you have your "score," where towards zero is perfect.

Their predictor had a score of I think 0.973 something (it's been years, don't quote me on that.) Their challenge was simple.

Beat their score by ten percent, and you trigger a one month end-of-game. At the end of that month, whoever's best wins le prize. One million dollars, obligatory Doctor Evil finger and all.

Netflix provided (anonymized) a little over 100 million actual ratings, where all you had was a userID, a movieID, a real rating, and separately, a mapping "this movieID is this title." You were only allowed to use datasets in your solution that were freely available to everybody, and you had to reveal them and write a paper about your strategy within one month after you accept le prize, honh honh honh.

Seriously, it was awesome. They were going to do a second one, but lawyers, and the world sadded.

So, there, you've got a ten times larger dataset. So surely sixteen thousand cores is the drastic thing, right?

Well, not really. I was running my solution on 32 Teslas, which in the day were $340 in bulk and had 480 cores each. So I actually "only" had 15,360 cores, which falls a whopping four percent short of Google's approach, which several years ago cost me about the price of a recently used car, and which I was able to resell afterwards as used, but without the bulk discount, for almost exactly what I paid for them in the first place.

Swish.

And I mean, I've got to imagine that someone else chasing that million dollar prize who thought they were going to get it invested more than I did. There were groups of up to a dozen people, data mining companies, etc.

So if one dude sitting in his then-Boise apartment can spend like $11k on a ten times this dataset dataset over a commercial prize?

Yeah.

Cyc still pantses all of us.


An image is a lot more complicated than a pair of ids and a rating. Counting the number of rows in the training database is misleading. I can build a reasonable dataset for a prediction task from a set of 100M rows from a database that I maintain in my spare time (http://councilroom.com , predict player actions given partial game states).

Don't get me wrong, the Netflix prize was cool.

What's cool about this is that Google hasn't given the learning system a high level task. They basically say, figure out a lossy compression for these 10 million images. And then when they examine that compression method, they find that it can effectively generate human faces and cats.


"An image is a lot more complicated than a pair of ids and a rating."

Predicting someone's reaction to a given movie is a lot more complicated than a pair of IDs and a rating, too, it turns out.

Let's take the speculation out of this.

You can get features of an image with simple large blob detection; four recurring boltzmann machines with half a dozen wires each can find the corners of a nose-bounding trapezoid quite easily. They'll get the job done in less than the 1/30 sec screen frame on the limited z80 knockoff in the original Dot Matrix Gameboy. You'll get better than 99% prediction accuracy. It takes about two hours to write the code, and you can train it with 20 or 30 examples unsupervised. I know, because I've done it.

On the other hand, getting 90% prediction accuracy from movie rating results takes teams of professional researchers years of work.

.

"I can build a reasonable dataset for a prediction task from a set of 100M rows from a database that I maintain in my spare time"

And you won't get anywhere near the prediction accuracy I will with noses. That's the key understanding here.

It's not enough to say "you can do the job." If you want to say one is harder than the other, you actually have to compare the quality of the results.

There is no meaningful discussion of difficulty without discussion of success rates.

I mean I can detect noses on anything by returning 0 if you ignore accuracy.

.

"What's cool about this is that Google hasn't given the learning system a high level task."

Yes it has. Feature detection is a high level task.

.

"They basically say, figure out a lossy compression for these 10 million images."

I have never heard a compelling explanation of the claim that locating a bounding box is a form of lossy compression. It is my opinion that this is a piece of false wisdom that people believe because they've heard it often and have never really thought it over.

Typically, someone bumbles out phrases like "information theory" and then completely fails to show any form of the single important characteristic of lossy compression: reconstructibility.

Which, again, is wholly defined by error rate.

Which, again, is what you are casually ignoring while making the claim that finding bounding boxes is harder than predicting human preferences.

Which is false.

.

"they find that it can effectively generate human faces and cats."

Filling in bounding boxes isn't generation. It's just paint by number geometry. This is roughly equivalent to using a point detector to find largest error against a mesh, then using that to select voronoi regions, then taking the color of that point and filling that region, then suggesting that that's also a form of compression, and that drawing the resulting dataset is generation.

And it isn't, because it isn't signal reductive.

Here, I made one for you, so you could see the difference. Those are my friends Jeff and Joelle. Say hi. The code is double-sloppy, but it makes the point.

http://fullof.bs/outgoing/vorcoder.html

See how I'm getting a dataset that isn't compression? See how that dataset is being used to make the original image, but nothing's being generated?

Same thing.


The person who invented the boltzman machines - is - the inventor of this technique. He invented boltzman machines in the 80s and spent over 20 years trying to get them to actually work on difficult tasks.

Your rant about this not being compression or whatever you're trying to say is completely off the mark. You don't seem to understand what this work is about.

The netflix challenge is a supervised learning challenge. You have lots of 'labeled data'. This technique is about using 'unlabeled' data.

(Side note: At one point, Geoff Hinton and his group using this technique had the best result in the netflix challenge, but were beaten out by ensembles of algorithms.)

Cyc has nothing to do with this and is huge failure at AI.

tldr; You don't seem to be knowing what you're talking about after having reading your comments, and seem to readily discount the some of the most prominent machine learning researchers in the world today. You're obscuring important results that newcomers might have found interesting to follow up on.


"and seem to readily discount the some of the most prominent machine learning researchers in the world today."

Your reading skills seem to be up to par, since I have discredited a list of zero people.

"You're obscuring important results that newcomers might have found interesting to follow up on."

Not only have I obscured no results, but this isn't actually something I have the power to do.


I appreciate your insight into the original article and your help placing it into the context of the broader field.


It's a kind claim, but I'm actually pretty much an outsider who dabbles. I haven't a clue what the state of the art is; Cyc is from the 1980s.


Ha' then I'm appreciating your honesty :-)


I'm not a large scale ML person, and not intending to take away from the achievement of the team in the OP, but experiments in large scale, unsupervised learning have been going for a long time (even using the autoencoder approach). When you think about it, large scale requires unsupervised...

Here is an old example with hundreds of millions of records and instances:

http://aaaipress.org/Papers/KDD/1998/KDD98-028.pdf

Both authors are now with Google.

Also, people here may not be as up to speed on the state of the art in face rec as they think they are. It's not as much of an unsolved problem as it was even 10 years ago.


"When you think about it, large scale requires unsupervised..."

Not necessarily. Crowdsourcing is another option, like Google's image tagging game, reCAPTCHA, et cetera.

Pay a herd of people to do things, and they'll do things for you. You don't have to pay them in money. Telling them they have a high score is often enough.


Yes, "requires" was too strong. I should have said they go well together. I was trying to get at the fact that it's highly common for large-scale work to be unsupervised.


Face recognition usually uses a hand-coded layer followed by a machine learning algorithm. This technique automatically devises that hand-coded layer. It also did this for 20,000 other categories and can also be applied equally well to audio or any other data type. Huge difference.


The large-number-of-categories result was the most novel and surprising to me.


> It does this for 20,000 different objects categories

With 15.8% accuracy.

> This is the most powerful AI experiment yet conducted (publicly known).

It's only powerful because they threw more cores at it than anyone else has previously attempted. From a quick skimming of the paper, there does not appear to be a lot of novel algorithmic contribution here. It's the same basic autoencoder that Hinton proposed years ago. They just added in some speed ups for many cores.

It's a great experiment though. You shouldn't detract from its legitimate contributions by making outlandish claims.


That in itself is fairly interesting, it says we can make dramatic improvements just throwing more processing power at the problem. Whatever happens on the algorithms research side of the problem in coming years, you can count on us having access to more processing power.


I think this is the most important aspect of this paper. Throwing more computing power at the problem increases performance significantly. It is possible that our algorithms are adequate but our hardware is not.


To a computer vision researcher, 15.8% on 20k categories is phenomenal.


> This is the most powerful AI experiment yet conducted (publicly known).

That's an ill-defined statement. AI is a vast and diverse field: what makes one demonstration more "powerful" than another? There are definitely other projects that could be viewed as being in the same class of "powerful" as this cluster.

This is certainly an interesting paper, but it has to be viewed in the context of a large and active field.


That's true. Let's say in machine learning then.


Let's not, because it's still wrong on the order of thirty counterexamples.

Let's say we'll stop making broad proclamations about the global best in a field we know very little about.


Andrew Ng has set many state of the art results on various data sets using similar approaches as the one described in the paper.

Here is a reasonably approachable talk he gave about it. http://www.youtube.com/watch?v=ZmNOAtZIgIk


This is a very interesting talk.

Thank you for sharing with me. :)

In return, I will offer you two interesting non-sequiturs, because I don't have anything topical and a non-sequitur seems like it's worth half what something germane would be.

.

Bret Victor, "Inventing on Principle." First 5 minutes are terribly boring. Give him a chance; it's 100% worth it.

http://vimeo.com/36579366

.

Damian Conway, "Temporally Quaquaversal Virtual Nanomachine."

It's as funny as it sounds.

http://yow.eventer.com/events/1004/talks/1028




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: