Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think you're missing an important aspect: how many users do you want to support?

> For comparison, you can buy a DGX B200 with 8x B200 Blackwell chips and 1.4TB of memory for around $500k. Two systems would give you 2.8TB memory which is enough for this.

That would be enough to support a single user. If you want to host a service that provides this to 10k users in parallel your cost per user scales linearly with the GPU costs you posted. But we don't know how many users a comparable wafer-scale deployment can scale to (aside from the fact that the costs you posted for that are disputed by users down the thread as well), so your comparison is kind of meaningless in that way, you're missing data.





> That would be enough to support a single user. If you want to host a service that provides this to 10k users in parallel your cost per user scales linearly with the GPU costs you posted.

No. Magic of batching allows you to handle multiple user requests in parallel using the same weights with little VRAM overhead per user.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: