As someone has pointed out, with BPE you specify the vocab size, not the token size. It's a relatively simple algo, this Huggingface course does a nice job of explaining it [2]. Plus the original paper has a very readable Python example [3].
[1] https://github.com/openai/tiktoken
[2] https://huggingface.co/course/chapter6/5?fw=pt
[3] https://arxiv.org/abs/1508.07909
As someone has pointed out, with BPE you specify the vocab size, not the token size. It's a relatively simple algo, this Huggingface course does a nice job of explaining it [2]. Plus the original paper has a very readable Python example [3].
[1] https://github.com/openai/tiktoken
[2] https://huggingface.co/course/chapter6/5?fw=pt
[3] https://arxiv.org/abs/1508.07909