The tokenizer does seem like it has a serious drawback for Chess notation.
The model is forced to lock in the column choice before the row choice when using chess notation. It can't consider the moves as a whole, and has to model longer range dependencies to accurately predict the best next move.... But it may never let the model choose the 2nd best move for that specific situation because of that.
The model is forced to lock in the column choice before the row choice when using chess notation. It can't consider the moves as a whole, and has to model longer range dependencies to accurately predict the best next move.... But it may never let the model choose the 2nd best move for that specific situation because of that.