just reading page 15, arg max = maximal ..., I think that global maximum or loca...

Eridrus · on Sept 24, 2016

The book is hundreds of pages long, if he diverges to talk about everything in Chapter 1 it would be a mess.

The scenario you describe is if alhpa=1, and it would do poorly. Try thinking about games where the opponent doesn't play an optimal game. Try thinking of stochastic environments.

piedradura · on Sept 24, 2016

What I suggest is to use the function: if v(s')==1 then 1 else the usual rule.

Eridrus · on Sept 24, 2016

Lets pretend alpha = 1 on a win and alpha = 0.1 on a loss.

Imagine a scenario where you play a game and the opponent plays poorly and you win; you then try and repeat the same thing again, this time the opponent has learnt from their mistakes and beats you. You'll keep playing the same losing move significantly more times because it worked that one time.

I don't know why everyone wants to second-guess the first chapter of the standard textbook in this space with what seems like no experience even thinking about this topic...

piedradura · on Sept 24, 2016

When you lose the value of v' change and so change the value of v.

pierrelux · on Sept 24, 2016

Short and mathematical: "Algorithms for Reinforcement Learning". PDF available: https://sites.ualberta.ca/~szepesva/RLBook.html

piedradura · on Sept 24, 2016

Thanks, the online pdf seems to be very good.