Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes. It requires a lot of ram, and even on a M4 with a lot of ram, if you give it 1mio tokens the prompt processing alone (that is, before you get the first response token) will probably take ~30min or more. However I'm looking forward to check if indeed I can give it a whole codebase and ask questions about it.


You might want to try caching to a file with mlx.

https://github.com/ml-explore/mlx-examples/pull/956

edit: here's a quick example for qwen2.5-1M from a mlx dev

https://x.com/awnihannun/status/1883611098081099914


That's cool, than you, but does MLX support the Qwen 1M context yet?


According to the tweet, not the full context yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: