Hacker News new | past | comments | ask | show | jobs | submit login
When A.I. Matures, It May Call Jürgen Schmidhuber ‘Dad’ (nytimes.com)
117 points by bcaulfield on Nov 29, 2016 | hide | past | favorite | 50 comments



I was fortunate enough to attend his talks at MIT CSAIL while working there.

Here are some notes in case anyone's interested -- [0] and [1]. I also recommend his TEDx talk.

[0] https://www.dropbox.com/s/v6bbktuywoqv2w3/jurgen-talk-notes....

[1] https://www.dropbox.com/s/ux53ism7fsxgo8x/jurgen-csail-talk2...


Always getting skeptical when you hear of a name before you hear of a concrete achievement. I can respect his research but a bit too much Jürgen Schmidhuber before even getting into what his contributions are.


https://scholar.google.ca/citations?user=gLnCTgIAAAAJ&hl=en&...

He's been an AI researcher for a while. I think his biggest contribution is understanding the vanishing gradient problem and creating the LSTM architecture. LSTMs are widely used in both industry and academia, and many neural net architectures that aren't LSTMs are heavily inspired by the LSTM idea.

One of his students, Alex Graves, is a researcher at DeepMind who is seen as one of the top people in RNNs.


To be fair, it's a popular magazine piece written for the layperson.

I don't know enough about AI research to judge the value of Dr. Schmidhhuber's contributions, but I've seen his name multiple times in past HN discussions.


>hear of a name before you hear of a concrete achievement

That's an excellent criterion.

I often wonder if many famous past intellectuals were mere celebrities where I can't recall a single achievement. And if one can't name a famous true idea in an current academic field, perhaps the field itself is worthless.


It is a poor criterion, because it is so subjective and dependent on PR machines.

The OP just has not heard of any accomplishments, but anyone with a little expertise in deep (reinforcement) learning knows about the major contributions to the field by Schmidhuber.

Using this criterion you are using popular media and fields you know not much about, to brush away the accomplishments of respectable scientists. Don't base your skepticism on your own lack of knowledge: that makes it selective -- You can not cut through the bullshit, if you don't know how to wield a sword.


The criterion is not about assessing a particular individual's contributions, it's about choosing what and whom to investigate in the first place. Of course this is subjective, and rightly so.


>Dr. Schmidhuber also has a grand vision for A.I. — that self-aware or “conscious machines” are just around the corner — that causes eyes to roll among some of his peers.

I can't help but wonder if the sole reason AGI doesn't exist is because it hasn't been figured out yet.

While that statement sounds obvious on the face of it, the implication is that we may already possess both the sufficient computational resources and human intelligence to realize its creation.


We certainly have enough compute at this point. 10^15 flops should be more than enough to run the brain by pretty much any analysis. Part of the issue is that evolution had at least 1 million such creatures over 52,000,000 weeks to improve since monkeys. So while human intelligent design of AI will certainly be a better algorithm than evolution, we may actually be a bit shy computationally of easily training an AI system, in spite of having more than enough to realize one.


It doesn't really tell us much. There is no reason ancient Athens or Taixue or Gundishapur couldn't have developed calculus or Newtonian physics. No resource was limiting them in the development, but those fields weren't developed until 1600+ years later.



That is so cool...


I think the printing press was a key invention which was needed for this. So many great things got lost because in history there was only 1 or few copies of it.


AGI (as in, a mathematical theory of universal intelligence) has already been figured out by one of Schmidhuber's students, and, proven to be uncomputable: we may never have sufficient computational resources.

https://en.wikipedia.org/wiki/AIXI


AIXI is a pretty deeply flawed theory of "universal intelligence". It can't handle modelling itself, it can't really handle the large dimensionality of its hypothesis space, it can't really handle imprecise stimulus information, and it can be made arbitrarily "stupid" in terms of "lost" rewards by selecting a Universal Turing machine to do the Solomonoff Induction over for which the "right" programs are arbitrarily long. And all of that is before we get to mere computational limitations or the uncomputability issue.


> AIXI is a pretty deeply flawed theory of "universal intelligence"

What would you propose as an alternative? If nothing, fine, but how can we relate, when we only have a single best (or if you prefer: flawed) thing?

> It can't handle modelling itself

It can add its internal states to the environment, and hence model these internal states.

> the large dimensionality of its hypothesis space

Its bounded by how many compressors / programs are available to it. Calculating the length of these programs that are consistent with the environment is feasible.

> handle imprecise stimulus information

I'm not sure if you mean imprecise stimuli here or imprecise sensors.

> can be made arbitrarily "stupid"

This can be seen as a flaw, or as a simple property (or even a feature). Something that can be optimal, and "stupid", does not detract much from its ability to be optimal.

> before we get to mere computational limitations or the uncomputability issue

Just because it is uncomputable, does not mean the theory is flawed. Sure, it is not practical, and we like practical things, but it is still valuable to have such a theory. Especially when approximations do yield practical applications.

Marcus Hutter on these issues: http://hunch.net/?cat=14


>What would you propose as an alternative? If nothing, fine, but how can we relate, when we only have a single best (or if you prefer: flawed) thing?

I tend to favor Karl Friston's "free-energy minimization" theory of the brain. For specifying tasks in engineering situations, I like the KL-control paradigm: the agent's task is to minimize (via the normal mechanisms of active inference) the Kullback-Liebler divergence between its induced distribution over latent variables (ie: causal models of the world) and some "target" distribution.

>It can add its internal states to the environment, and hence model these internal states.

No, it can't. I'd have to do a bunch of work to sketch some proofs, but without the ability to consider multiple observable random variables and use hierarchical Bayes on them, AIXI will not be able to detect that certain environmental states are actually equivalent to its own internal states. Hell, since Solomonoff Induction is incomputable, it's not even in its own hypothesis space, so it can never locate a program that generates itself.

>Its bounded by how many compressors / programs are available to it. Calculating the length of these programs that are consistent with the environment is feasible.

See below, please.

>I'm not sure if you mean imprecise stimuli here or imprecise sensors.

Both. From the perspective of any possible reasoning device, the problem is simply low likelihood precision (that is, high variance/entropy of the likelihood function). If I have so much noise in my sensory stimulus that all I see is 50% "heads" and 50% "tails" in my input bits, without any ordering to those bits, then I do not have the information to infer any complex causal structure (ie: the real world) behind those bits.

The point being: when the environment is noisy, it forces the mind to favor simpler explanations, even when those explanations are not the Truth, due to the parameter space over possible Truths (for example, random seeds added to some causal structure) being too large and spreading out the probability mass too thinly. Since Solomonoff Induction deals with algorithmically random programs as its hypothesis space, this means that any noise in the environment sufficient to add one bit of random seed to the shortest generating program cuts the probability mass allocated to the correct hypothesis by half.

>This can be seen as a flaw, or as a simple property (or even a feature). Something that can be optimal, and "stupid", does not detract much from its ability to be optimal.

AIXI is only optimal for a given Turing machine (ie: state-machine with alphabet). The "arbitrarily stupid prior" thing is just to say that for any program, we can create a Turing machine for which that program takes up arbitrarily much tape-space, thus making that program arbitrarily improbable in the Solomonoff Measure over that Turing machine's programs.

>Just because it is uncomputable, does not mean the theory is flawed. Sure, it is not practical, and we like practical things, but it is still valuable to have such a theory. Especially when approximations do yield practical applications.

Hence why I called them "mere" computational issues.


Jürgen Schmidhuber is an early neural net pioneer, especially with regards to making recurrent neural nets a reality. As others have mentioned he was instrumental with LSTMs, but also advised Alex Graves on his work with Connectionist Temporal Classification (CTC) that were instrumental in allowing neural nets to be used for speech recognition.


He was too early. People have been futzing around with neural nets since the 1950s. We can now make them work, but the training process requires huge amounts of compute power to do a huge number of iterations making tiny changes on each cycle. This has only been feasible in the last few years.


you're missing the importance of techniques that allow backprop along very deep nets. that's related to, but distinct from, the amount of computation required to train a model.


Seriously check out Schmidhuber's work (http://people.idsia.ch/~juergen), from his website:"Since age 15 or so, the main goal of professor Jürgen Schmidhuber has been to build a self-improving Artificial Intelligence (AI) smarter than himself, then retire.".


A remarkably inefficient plan to be lazy.


Never underestimate the power of being lazy. Power steering wasn't invented by an exercise freak.


Actually it was invented by an exercise freak. Francis Davis was a marathon runner in the 20's around the same time he invented power steering.


This is why I love hacker news.


Power steering worked.


Perhaps he achieved his dream when he was 15, and is just passing off his AIs achievements as his own.


Mind, it's pretty questionable how connectionism is supposed to give us "AIs" that can act autonomously and improve themselves.


Jürgen maintains a lot of sock-puppet accounts (even on wikipedia and amazon reviews). Let's see if any of them show up in this thread. He is a perfectly good researcher and his students have done a lot of excellent work, but I don't approve of his choices to use an army of sock-puppets or his (I will try to put this charitably) somewhat non-standard ideas on how academic credit should be apportioned.


But you're cool with anonymously posting a public accusation of unprofessional behaviour without supporting evidence?

(edit: apparently people are cool with this. That's weird. I'm not claiming the parent is untrue, it's just that you shouldn't talk smack about people in public without presenting evidence if you don't use your real name.)


if you don't use your real name

From what does the supposed requirement to use one's real name derive?


It's not a requirement. It's common decency. Accusing someone of sockpuppetry is no different than calling him a liar. Doing it from behind a pseudonym without offering any evidence is irresponsible and cowardly, as it gives that person no way to defend his reputation.


No one's reputation is at stake with internet comments from anonymous posters.

The internet is protected from the forces that threaten physical forums where each voice is a real person. You cannot pay off the internet to talk nice about you, and you can't follow anon home and beat him up for speaking negatively. It sounds like Jurgen has accumulated some bad karma on the internet; this is not baseless accusation as much as it is keeping ourselves at a healthy level of skepticism regarding someone who has already lost our trust. The cowards with loud, strawman accusations are usually appropriately downvoted when their criticisms are out of place.


To be fair that accusation is also mentioned in the article.


Maybe I overread it, but I didn't see it.


From paragraph 7: "He has been accused of taking credit for other people’s research and even using multiple aliases on Wikipedia to make it look as if people are agreeing with his posts."


"Has been accused". Evidence lacking.


I do not take a position on the accusation. The purpose of the comment to which you are responding is to support emmelaich's assertion that the accusation is mentioned in the article.


Hey Jurgen, I'm a big fan of your work


Hey Jurgen, I'm a big fan of your work!


Clearly you're being downvoted by Teodolfo's sockpuppets.


does he maintain them personally or are they LSTM bots? :)

And btw, I think his sense of humour - and brilliance - comes across best in person. I recommend attending one of his talks or watch a video. But just do it once, otherwise you'll hear the same jokes again.


In a nod to his history compression schemes, he sometimes slightly changes the punchlines of his jokes.

http://people.idsia.ch/~juergen/creativity.html


I think Juergen is an interesting and intelligent guy. Besides his work on RNN architecture, he also holds a belief that it is possible that a short computer program has generated our universe.

I think it would be interesting for him to talk about the relationship between his belief that a deterministic universe theory is possible, with his practice of using statistical learning algorithms. Some people might view ANNs (including RNNs) as good learning algorithms for sorting out statistical patterns of probabilistic systems, but not as helpful for analyzing deterministic systems. But I think there is some good insight to be had from exploring the value of statistical learning algorithms on deterministic systems.


Unlikely


Yeah as an easter egg.

Seriously though, what title will it bestow on Wolfram?


The tone of the article title reminds me of the posts of the Mindpixel guy, who also claimed to have been developing strong AI but turned out to be a person with severe mental illnesses...

This is not a good article please don't upvote it.

If you think LTSM is interesting, I am 100% with you... but there must be for sure much better articles about LTSM or the author himself that you can pick from to share here.

This article doesn't add a lot of value, takes a lot of effort to make it's point and feels like a huge waste of time when you finish reading it.


Your post doesn't make sense - how is that related to this? Why is not up voting this article being a good HN user?


> developing strong AI but turned out to be a person with severe mental illnesses

Those are not mutually exclusive. If you want to actually claim that Chris McKinstry was only insane, and his project without any scientific merit, you should make the case for it. Brushing projects away, only because the conceiver of them was mentally ill, is not nice at all.


The point is that "All AI will call X father" is a claim that someone like him would make. That was the entire point.

My point is about the how the article is titled and written not the researcher.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: