Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> That would be enough to support a single user. If you want to host a service that provides this to 10k users in parallel your cost per user scales linearly with the GPU costs you posted.

No. Magic of batching allows you to handle multiple user requests in parallel using the same weights with little VRAM overhead per user.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: