Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm curious why not. I am running a few different models on my mac studio. I'm using llama.cpp, and it performs amazingly fast for the $7k I spent.


I said in parallel.


Surely you can run smaller models together




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: