Hacker News new | past | comments | ask | show | jobs | submit login
Deep Reinforcement Learning: Pong from Pixels (karpathy.github.io)
189 points by Smerity on May 31, 2016 | hide | past | favorite | 13 comments



Fascinating.

Many years ago, when I wanted to become a programmer and I didn't know anything about code, I used to fantasize and be amazed by programs. Code was like dark magic.

This is how I feel today about machine learning. Neural networks, liquid state machines. It's wonderful voodoo to my eyes.

I hope one day I get to work in that field, it seems so useful for solving big world problems. I have notice a definite rise in articles being written and shared on HN about it lately, that's great.

For those completely in the dark, I found this library to have great wiki pages about the basics of neural network programming. Great read, I recommend it. https://github.com/cazala/synaptic/wiki/Neural-Networks-101


Oooh liquid state machines/reservoir computing, what delicious voodoo. Look up optical reservoir computing, what wonderful nonsense. Smash the thing into a million random pieces and look out for the pieces that happen to make your problem easy. Wht. But it works.


Reinforcement Learning is one of the most exciting areas of research in machine learning and AI going on right now in my opinion. It is going to play heavily in creating AI that can make decisions in dynamic environments.

A great introduction to the topic is the book Reinforcement Learning: An Introduction by Sutton & Barto. You can find the official HTML version of the 1st edition and a PDF of a recent draft of the 2nd ed. here: https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html


In almost all real-time networks playing games, there's very high jitter in inputs. Even when the machine is moving straight, it's always very keen on doing some wiggling with the other keys.

My question is: Is it possible to eliminate that by further training? Naively you could drop 'stupid' inputs, but I assume that may also mess with the machine's understanding.


Notice how the output (or the network) is stochastic, not a solid value. You could certainly tweak the output sampling function to reduce jitter.

Further training may also reduce this, but technically, since the jitter is not causing any reduction in the reward, it might not. The best approach would likely be to alter the reward system to discourage jittery play...but again, there is no point because it does not reduce fitness.

I suppose where this is important is in robotics where jittery movement might actually be dangerous, or wear down hardware. In that case, you could certainly use an output smoothing function and tweak the reward.


May be you could introduce a (small) penalty to every keystroke. This might select against unnecessary movement and thus reduce jitter.


Note that he in this pong example in particular every frame he gives the network and option to either go up or down, but no option for standing still. So it _has_ to be jittery.

But as the other people have commented, adding a small penalty on every move and giving it the option to stand still (along with some normalization?) might give a much smoother result.


I went back to that part of the article several times, because like you, I found it a bit odd that there was no option to stand still.

The first part of the article definitely seems to give the impression that the choice is between UP or DOWN every frame.

But then a bit further on it was a bit more ambiguous and I could also interpret it as giving a probability for UP and a separate one for DOWN. Then it could also choose neither. But then it also could choose both, and you need a conflict resolution procedure (do neither, pick the one with highest probability, maybe just roll again?). Unless the actual game also has two buttons and you can just do whatever the game engine will do if you press both.

Another possibility might be to model the output a bit more like a human player would do it. First I'd change it into a series of timings + note-on/note-off commands (like MIDI), then perhaps add jitter to the timings (making sure the note-off doesn't jitter before the corresponding note-on). I've read that adding this kind of noise to a NN tends to improve its robustness, so that might help?

Most of those changes would happen as transformation step between the output layer and the simulation input, so I presume the learning algo itself can mostly stay the same. But there's probably a few snags to that as well.


With further training I don't think it's possible: since those movements are not useful nor harmful, they will appear in winning and losing matches. A possible solution might be including some distance traveled metric in the reward function...


You could probably do that by introducing a regularization term on the activity. Possibly L1 because that will tend toward sparse outputs.


In the context of AI and gaming, I definitely recommend this series of three youtube videos:

https://www.youtube.com/watch?v=xOCurBYI_gY

Some games are played better than human would.


These videos by Tom7 are indeed great. They don't display the use of deep learning though; the method is something quite different [1].

Maybe you could use the objective functions of Tom7 to determine which moves are good or bad in order to train a policy network?

[1]: http://www.cs.cmu.edu/~tom7/mario/mario.pdf


A neural Network is specializing in one particular problem set? You can not create Meta-Neural networks that reconnect the specialized Networks or grow new ones?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: