Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know, I think that extending context windows is actually detrimental because people assume they can just dump things in there until it fills up. You still have to deal with the limited attention that the models have, and only filling the context with things relevant to the particular thing you are trying to solve is going to be the most effective approach. If you have too much information for it to fit into a 128K window, I think you just have too much information. The entirety of Don Quixote at over 1000 pages is less than 64,000 tokens.


That sounds low by about 10x, assuming Don Quixote has 430k words (per Google).

Still, yes, I don't know of a single model that doesn't go off the rails if you actually try to take advantage of its context length specification.


Well, I loaded up Llama 3 and downloaded the novel, and for the English translation we get 545997 tokens and in the original Spanish 653981 tokens. So when I estimated it did lose a an order of magnitude. Thanks for the correction.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: