Hacker News new | past | comments | ask | show | jobs | submit login

You batch them. If token limit is 32k for example, you summarize them in batches of 32k tokens (inc. output) then summarize all the partial summaries.

It's what we were doing at our company until Anthropic and others released larger context window LLMs. We do the TTS locally (whisperX) and the summarization via API. Though we've tried with local LLMs, too.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: