Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The entire HP series is about one million words.




Harry Potter and the Order of Phoenix alone is 400K tokens.

Curious, I found an epub, converted it to a txt, and dumped it into the Qwen3 tokenizer. It yielded 359,088 tokens, end to end.

Using the GPT-4 tokenizer (cl100k_base) yields 349,371 tokens.

Recent Google and Anthropic models do not have local tokenizers and ridiculously make you call their APIs to do it, so no idea about those.

Just thought that was interesting.


And takes up a proportional width of everyone's bookshelves along side the others.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: