Yes. It requires a lot of ram, and even on a M4 with a lot of ram, if you give i... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		terhechte 6 months ago \| parent \| context \| favorite \| on: Qwen2.5-1M: Deploy your own Qwen with context leng... Yes. It requires a lot of ram, and even on a M4 with a lot of ram, if you give it 1mio tokens the prompt processing alone (that is, before you get the first response token) will probably take ~30min or more. However I'm looking forward to check if indeed I can give it a whole codebase and ask questions about it.

marci 6 months ago [–]

You might want to try caching to a file with mlx.

https://github.com/ml-explore/mlx-examples/pull/956

edit: here's a quick example for qwen2.5-1M from a mlx dev

https://x.com/awnihannun/status/1883611098081099914

terhechte 6 months ago | [–]

That's cool, than you, but does MLX support the Qwen 1M context yet?

marci 6 months ago | | [–]

According to the tweet, not the full context yet.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact