Having a hard time to parse what is the action space here. The paper claims: Alp...

sanxiyn · on Nov 1, 2019

It is extremely different from AlphaZero... In fact, they heavily rely on human knowledge, which is like opposite of AlphaZero. To quote the paper, "We found our use of human data to be critical in achieving good performance with reinforcement learning".

hervature · on Nov 1, 2019

Ok, you’re right. I should’ve said AlphaGo. But that in itself shows what I mean that this is almost a step backwards.

sanxiyn · on Nov 1, 2019

AlphaZero was miraculously good, almost to the point of straining credibility. AlphaGo and AlphaStar are more like normal advances. They are mostly engineering, although theoretical contributions are not trivial. (Using reinforcement learning for value network in case of AlphaGo, and multi-agent self-play setup in case of AlphaStar, since straight self-play doesn't work.)