Not OP but I believe the low score is due to not enough training time and incorrect parameters such as frameskip. Space Invaders was mentioned as one of the few games they needed to lower the frameskip (from 4 to 3 or 2?) on because of the flashing lasers. I'm assuming OP left the parameters as is from the implementation by Nathan Sprague, which has frameskip at 4, and trained for a few epochs.
Interesting. I am not sure how this deepmind game was played. Note that a typical game with epsilon = 0.1 achieves results around 550 points. In the nature paper they were running the game with epsilon = .05
If you spot any other differences I'd be happy to learn about them.
I learned of q-learning in the berkeley AI course (the pacman course). So I sort of get that. but the course didn't touch neural networks.
what's the difference between q-learning with and q-learning without neural networks? Or, rather, in the process of doing q-learning, where does the neural network slot in, what does it replace if there is no nn?
Note that Neural Network is just a very complex function.
You usually think of Q as a function (S, A) -> (Expected accumulated future reward)
which is equivalent to S -> A -> (Expected accumulated future reward)
the Neural Network is S -> (A -> (Expected accumulated future reward))
or if you whish the output layer of neural network consists of |A| neurons. Each indicates the (Expected accumulated future reward) given current experience.
So what we are saying is that a neural network can be used as the implementation for the q-function? I.e., a q-function is by definition only a mapping of (S,A) pairs to an expected future reward. We can do this using a traditional style like value iteration or back propagation, or we can use a neural network? And it's just a matter of implementation?
Yes, we try to approximate Q function with neural network.
Which is basically an enhanced version of gradient-descent Sarsa.
The main trick to notice is that you can't provide consecutive frames as mini-batches as these would be highly correlated and would derail stochastic gradient descent.
So we keep many frames (and all other necessary information) in memory and draw these experiences uniformly to form a minibatch that becomes input to the neural network
As an amateur, I've always wondered if reinforcement learning could work with games where there are some probabilities in place (e.g. poker).
What happens when the action taken is a good one but the outcome is negative due to bad luck?
Absolutely. Q-learning has this capabilities and a shallow neural network was used back in 1992 to play backgammon, which has a lot of stochasticity.
See https://en.wikipedia.org/wiki/TD-Gammon
I would like to learn more about the techniques used here. Can anyone recommend some books, or online materials but I generally find those worse. I have a moderately strong math background (undergraduate degree with double major in CS/Math).
someone posted this comment in another thread [1]. (Also read the two parent comments of that comment).
Essentially read and do exercises of ISLR (Intro to Statistical Learning, with applications in R). Will both give you a strong base, and increase your job prospects (according to the comment).
P.S: my personal opinion. For every R language exercise in that book, try to do a similar exercise in Python. If you don't know Python, learn it. (you'll thank me later).
Now, train the network jointly over the game sequence. Or even better, when given a chance to take action rollout on each action and learn jointly on that rest of gameplay.
Reinforcement learning is very hard. Especially when you create meaningful games and then don't use the fact that a whole game is a one long chain of events, and instead force learning on windowed sequence.
Neural network has enough parameters to remember much of these windows and will clearly perform well, but the training last too long given the fact that no structured information is used.
https://m.youtube.com/watch?v=ZisFfiEdQ_E
For comparison, DeepMind:
https://m.youtube.com/watch?v=ePv0Fs9cGgU
Note that 550 is a very low score in Space Invaders.