I concur. In my work (analysing news show transcripts and descriptions), I work ...

I concur. In my work (analysing news show transcripts and descriptions), I work with about 250k input tokens max. Tasks include:

- Summarize topics (with references to shows) - Find quotes specific to a topic (again with references)

Anything above 32k tokens fails to have acceptable recall, across GPT-4o, Sonnet, and Google's Gemini Flash 1.5 and 2.0.

I suppose it kind of makes sense, given how large context windows are implemented via things like sparse attention etc.