mistral which clocks at 32k context I may be wrong, but my understanding was/is:... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

rahimnathwani 8 months ago | parent | context | favorite | on: Self-hosted offline transcription and diarization ...

  mistral which clocks at 32k context

I may be wrong, but my understanding was/is:

- Mistral can handle 32k context, but only using sliding window attention. So it can't really process all 32k tokens at once.

- Mixtral (note the 'x') 8x7B can handle 32k context without resorting to sliding window attention.

I wonder whether Mistral would do a better job summarizing a long (32k token) doc all at once, or using recursive summarization.

icelancer 8 months ago [–]

Hmm. Interesting question. We had no issues using Mixtral 8x7B for this, perhaps reinforcing your point. We use fine-tuned Mistral-7B instances but not for long context stuff.

Maybe a neat eval to try.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact