I don't know, I think that extending context windows is actually detrimental bec...

CamperBob2 · 2025-07-25T16:25:55 1753460755

That sounds low by about 10x, assuming Don Quixote has 430k words (per Google).

Still, yes, I don't know of a single model that doesn't go off the rails if you actually try to take advantage of its context length specification.

Eisenstein · 2025-07-25T17:11:10 1753463470

Well, I loaded up Llama 3 and downloaded the novel, and for the English translation we get 545997 tokens and in the original Spanish 653981 tokens. So when I estimated it did lose a an order of magnitude. Thanks for the correction.