Hacker News new | past | comments | ask | show | jobs | submit login

Reading the paper, it doesn't at all sound like AlphaGo uses anything that TD-Gammon used.

It uses MCTS, which is unlike minimax. It doesn't use temporal difference learning, although they say that the policy somewhat resembles TD.

That doesn't sound like 'essentially built on', its sounds maybe like 'slightly influenced by'




You're missing the forest for the trees.

Tesauro's work on TD-Gammon was pioneering at the high level, i.e. combining reinforcement learning + self-play + neural networks.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: