Well it'll always depend on the length of the meeting to summarize. But they are...

rahimnathwani · 2024-05-27T00:23:37 1716769417

  mistral which clocks at 32k context

I may be wrong, but my understanding was/is:

- Mistral can handle 32k context, but only using sliding window attention. So it can't really process all 32k tokens at once.

- Mixtral (note the 'x') 8x7B can handle 32k context without resorting to sliding window attention.

I wonder whether Mistral would do a better job summarizing a long (32k token) doc all at once, or using recursive summarization.

icelancer · 2024-05-27T00:33:35 1716770015

Hmm. Interesting question. We had no issues using Mixtral 8x7B for this, perhaps reinforcing your point. We use fine-tuned Mistral-7B instances but not for long context stuff.

Maybe a neat eval to try.