I completely agree with you. Let me just add two remarks. First, although pickin...

dnautics · on June 22, 2020

That's really cool and I didn't think of that. I just wanted clarification: that means you train the agent without the deterministic solution and your "validation/test" (I'm not sure what those phases are called in unsupervised learning) sets are done without the deterministic solution.

jonath_laurent · on June 22, 2020

Yes, the agent is trained without access to the deterministic solution.

tromp · on June 22, 2020

I did not see an evaluation of how close to perfection the agent becomes. Did you compute any sort of error rate (by finding moves that turn a won position into a non-won one or a drawn position into a lost one) ? And how this error rate drops over time as learning advances? That would indeed be very interesting to see.

vishvananda · on June 22, 2020

My team did an implementation of alpha zero connect four a couple of years ago. Our findings are in a series of blog posts starting at https://medium.com/oracledevs/lessons-from-implementing-alph.... We didn't manage to get to perfection either on policy, but got pretty close. You can play against some versions of the network here: https://azfour.com

jonath_laurent · on June 22, 2020

Your series of blog articles has been an important source of inspiration in writing AlphaZero.jl and I cite it frequently in the documentation. Thanks to you and your team!

jonath_laurent · on June 22, 2020

Such an evaluation is available in the tutorial: https://jonathan-laurent.github.io/AlphaZero.jl/dev/tutorial...

Admittedly, the connect four agent is still far from perfect but there is a lot of margin for improvement as I have done very little hyperparameters tuning so far.