Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I concur. In my work (analysing news show transcripts and descriptions), I work with about 250k input tokens max. Tasks include:

- Summarize topics (with references to shows) - Find quotes specific to a topic (again with references)

Anything above 32k tokens fails to have acceptable recall, across GPT-4o, Sonnet, and Google's Gemini Flash 1.5 and 2.0.

I suppose it kind of makes sense, given how large context windows are implemented via things like sparse attention etc.



What could be the reason? Do they selectively skip tokens to make it appear they support the full context?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: