Hacker News new | past | comments | ask | show | jobs | submit login

I do not believe this is still true in MuZero, where no rules are ever explicitly encoded, beyond what leaks in from the reward function?



Indeed, in this latest iteration the network learns a model of the game by itself. Note however that it still uses MCTS and performs a look-ahead, i.e. it is still being "told" by the programmers how to do search (planning), only now it is left to the system to determine how it wants to represent states/actions/policies internally.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: