I believe this kind of graph exploration is what we need to progress reasoning in AI. Plain LLMS will fail. The link has tons of good references, including the Zobrist hashing https://en.wikipedia.org/wiki/Zobrist_hashing for game tables. We need to find a good hashing for language based state description so that graph exploration doesn't explode computationally. Another good read for Tree Search is Thinking Fast and Slow: https://arxiv.org/abs/1705.08439 and Teaching Large Language Models to Reason with Reinforcement Learning: https://arxiv.org/abs/2403.04642 comparing the MCTS approach to other current RL strategies.
What might be a step forward is a joint learning of the state representation with the search algorithm. Search algorithm explores the NN representation of the state for which you can get the cost.
Genie from DeepMind is a good demonstration where discrete state is being modeled. NN learns a very complex representation with collision detection and actions. Instead of decoding that state into pixels, search could probably be done directly on that state.
Of course, this architecture could be very different.
Active Inference is kinda "a joint learning of the state representation with the search algorithm"...or at least, related to the idea. I like framing it as a joint learning problem.
Would it be impossible to marry the two somehow? It’s hard for me to believe that the brain only uses a single technique for everything and likely has many different tools in its toolbox with a selector on top to know how to leverage each appropriately.
My chess experience is not that high, but enough that I'm skeptical of the claim that the same position would be duplicated in a search tree enough for it to matter. I'd be interested to see an actual measure of this with Leela Zero. (And I'm not even considering the threefold repetition and fifty-move rules, which when considered in the state make a repetition much less likely).
Ko is very common in Go, though. You can't validly repeat the board position (simplified way of handling Ko rules) but if tree search failed to evaluate ko positions correctly it'd be easy to construct situations where the AI would make bad moves.
> Also, as far as the name "Monte-Carlo Tree Search" itself, readers might note that there is nothing "Monte-Carlo" in the above algorithm - that it's completely deterministic!
MCTS as commonly implemented is deterministic? How strange! I assumed there was randomness in the sampling.
Originally MCTS did have randomness, yes. I think the article mentions it, but this came in in the form of playouts at the end, used just to evaluate positions. This has been replaced in current similar projects by NN evaluations, which are higher quality (as you can probably imagine, just playing random moves to see who wins is not very good, it was just the best strategy known at the time).
So the monte-carlo turned out to be an unessential (and suboptimal) part of what is now still called MCTS, making the name a bit unfortunate.
Small side note, standard MCTS is often implemented with heuristic guiding the moves in the playouts and not just random moves, as that gives more interesting information.
It's technically a different algorithm just under the same "monte carlo" name.
However something interesting of note is that since most applications of monte carlo methods rely on PRNGs instead of true RNGs, they are technically deterministic (as given the same PRNG seed you should always get the same result for a given input).
So what this algorithm is doing instead of using a normal PRNG and a separate heuristic, is to instead query the neural network. This works because the neural net is an heuristic over a massive search space which ends up acting like a very poor PRNG that is heavily biased towards specific results based on the training of the neutal net, which in turn ends up looking like a PRNG with a set of heuristics applied.
The important thing to note is that this is a specialisation of MCTS and as such shouldn't technically work for all use cases.
I can't speak for this use case, but in many other places, one isn't even using a PRNG, but rather Quasi-Monte Carlo techniques with no randomness at all. Monte Carlo doesn't need randomness, but a good coverage of the search space.
Somehow the paper they mention completely flew under my radar when I was researching MCTS. Surely it's gonna be a lot of fun to give this modification a spin on my next opportunity.
When we make game-playing AI (which is all AI, depending on your analogy comfort), one of the most promising techniques is Tree Search, which ranks moves based on the descendant moves. In games where you could reach the same state in many ways, much memory might be wasted to re-record the same state node on different branches.
This article is a nice exploration of an approach called Graph Search, which essentially trades compute for memory by doing extra compute work (hashing the game states) to check to see if the nodes are already visited. That saves us from re-recording nodes we already saw, and consequently converts trees (free of cycles) into directed acyclic graphs.
This forces some tinkering with the tree search to get correct results, specifically it demands a focus more on edges (actions or moves) as the unit of optimization, rather than on vertices (states). It’s a well written technical essay in literate programming written by someone who understands their subject.
I would argue that this saves both compute and memory for any significant search. In a traditional tree-search without treating states as identical and revistable still requires you to store every node that you visit in memory to track the N, Q, and child relations.
Applying the algorithm adds a hash associated with each node, and a counter on each edge. On the other side you're de-duplicating identical states in your search space, saving the memory of duplicate nodes but more importantly saving compute by not requiring independent searches of the same state space. Whether this trade off actually saves memory will be determined based on how likely duplicate states are in your search space.