You batch them. If token limit is 32k for example, you summarize them in batches of 32k tokens (inc. output) then summarize all the partial summaries.
It's what we were doing at our company until Anthropic and others released larger context window LLMs. We do the TTS locally (whisperX) and the summarization via API. Though we've tried with local LLMs, too.
It's what we were doing at our company until Anthropic and others released larger context window LLMs. We do the TTS locally (whisperX) and the summarization via API. Though we've tried with local LLMs, too.