Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't understand what's not optimized on 5090. If we're comparing with Apple chips or AMD Strix Halo yes you will have very different hardware + software support, no FP4 etc. but here everything is CUDA, Blackwell vs Blackwell, same FP4 structured sparsity, so I don't get how it would be honest to compare a quantized FP4 model on Spark with an unoptimized FP16 model on a 5090 ?




To me, what I think they are saying is that the Spark can use a FP16 unoptimized model with 200B parameters. However I don't really know.

You can't. The Spark has 128GB VRAM; the highest you can go in FP16 is 64B — and that's with no space for context.

200B is probably a rough estimate of Q4 + some space for context.

The Spark has 4x the VRAM of a 5090. That's all you need to know from a "how big can it go" perspective.


from the NVidia DGX Spark datasheet:

  With 128 GB of unified system memory, developers can experiment, fine-tune, or inference models of up to 200B parameters. Plus, NVIDIA ConnectX™ networking can connect two NVIDIA DGX Spark supercomputers to enable inference on models up to 405B parameters.

The datasheet isn't telling you the quantization (intentionally). Model weights at FP16 are roughly 2GB per billion params. A 200B model at FP16 would take 400GB just to load the weights; a single DGX Spark has 128GB. Even two networked together couldn't do it at FP16.

You can do it, if you quantize to FP4 — and Nvidia's special variant of FP4, NVFP4, isn't too bad (and it's optimized on Blackwell). Some models are even trained at FP4 these days, like the gpt-oss models. But gigabytes are gigabytes, and you can't squeeze 400GB of FP16 weights into only 128GB (or 256GB) of space.

The datasheet is telling you the truth: you can fit a 200B model. But it's not saying you can do that at FP16 — because you can't. You can only do it at FP4.


I never claimed the 200B model was FP16.

If the 200B model was at FP16, marketing could've turned around and claimed the DGX Spark could handle a 400B model (with an 8-bit quant) or a 800B model at some 4-bit quant.

Why would marketing leave such low-hanging fruit on the tree?

They wouldn't.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: