just reading page 15, arg max = maximal ..., I think that global maximum or local maximum is better than maximal.
I would like to read all the interesting fruit of RL in just one hour, can someone suggest a short book for someone with advanced maths skills?
Thanks a lot to the authors the book seems to be really interesting.
Edit: In page 25, an extended example: tic-tac-toe the rule to update the value of each state v(s)=v(s)+a(v(s')-v(s)) doesn't take into account that if in s' there is a winning strategy by the policy then previous values is also part of a winning strategy. So if v(s')=1 (win) then v(s)=1 (I can win).
In my very humble opinion, the author should digress a title to talk about this very important point.
The book is hundreds of pages long, if he diverges to talk about everything in Chapter 1 it would be a mess.
The scenario you describe is if alhpa=1, and it would do poorly. Try thinking about games where the opponent doesn't play an optimal game. Try thinking of stochastic environments.
Lets pretend alpha = 1 on a win and alpha = 0.1 on a loss.
Imagine a scenario where you play a game and the opponent plays poorly and you win; you then try and repeat the same thing again, this time the opponent has learnt from their mistakes and beats you. You'll keep playing the same losing move significantly more times because it worked that one time.
I don't know why everyone wants to second-guess the first chapter of the standard textbook in this space with what seems like no experience even thinking about this topic...
I would like to read all the interesting fruit of RL in just one hour, can someone suggest a short book for someone with advanced maths skills?
Thanks a lot to the authors the book seems to be really interesting.
Edit: In page 25, an extended example: tic-tac-toe the rule to update the value of each state v(s)=v(s)+a(v(s')-v(s)) doesn't take into account that if in s' there is a winning strategy by the policy then previous values is also part of a winning strategy. So if v(s')=1 (win) then v(s)=1 (I can win). In my very humble opinion, the author should digress a title to talk about this very important point.