Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It isn’t streaming the ollama output so it feels slow (~3 words/second on a 3090 with the defaults). Using ollama directly streams within a second and you can kill it early. I don’t understand the UX of looping responses to the same question either. This does not feel like magic.


It's currently set not to stream (https://github.com/guywaldman/magic-cli/blob/4d4dca034063aa6...). The performance is something I plan to improve.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: