Hacker Newsnew | past | comments | ask | show | jobs | submit | artemisart's commentslogin

That's very true and what's segmenting the market, but I don't understand why you're saying the 5090 supports only 12B model when it can go up to 50-60B (= a bit less than 64B to leave room for inference) as it supports FP4 as well.

Its for comparison using raw, non optimized models. Both can do much better when you optimize for inference.

Information is in the ratio of these numbers. They stay the same.


Ok then just to clarify: you can fit 4x larger models on the Spark vs 5090, not 17x.

@nabla9 have tried to tell you that for DGX Spark, you can also use optimized models; therefore, this means that Spark can also be used for inference with bigger models, such as those exceeding 200B.

Please compare the same things: carrots VS carrots, not apples VS eggs.


I don't understand what's not optimized on 5090. If we're comparing with Apple chips or AMD Strix Halo yes you will have very different hardware + software support, no FP4 etc. but here everything is CUDA, Blackwell vs Blackwell, same FP4 structured sparsity, so I don't get how it would be honest to compare a quantized FP4 model on Spark with an unoptimized FP16 model on a 5090 ?

To me, what I think they are saying is that the Spark can use a FP16 unoptimized model with 200B parameters. However I don't really know.

You can't. The Spark has 128GB VRAM; the highest you can go in FP16 is 64B — and that's with no space for context.

200B is probably a rough estimate of Q4 + some space for context.

The Spark has 4x the VRAM of a 5090. That's all you need to know from a "how big can it go" perspective.


from the NVidia DGX Spark datasheet:

  With 128 GB of unified system memory, developers can experiment, fine-tune, or inference models of up to 200B parameters. Plus, NVIDIA ConnectX™ networking can connect two NVIDIA DGX Spark supercomputers to enable inference on models up to 405B parameters.

The datasheet isn't telling you the quantization (intentionally). Model weights at FP16 are roughly 2GB per billion params. A 200B model at FP16 would take 400GB just to load the weights; a single DGX Spark has 128GB. Even two networked together couldn't do it at FP16.

You can do it, if you quantize to FP4 — and Nvidia's special variant of FP4, NVFP4, isn't too bad (and it's optimized on Blackwell). Some models are even trained at FP4 these days, like the gpt-oss models. But gigabytes are gigabytes, and you can't squeeze 400GB of FP16 weights into only 128GB (or 256GB) of space.

The datasheet is telling you the truth: you can fit a 200B model. But it's not saying you can do that at FP16 — because you can't. You can only do it at FP4.


I never claimed the 200B model was FP16.

If the 200B model was at FP16, marketing could've turned around and claimed the DGX Spark could handle a 400B model (with an 8-bit quant) or a 800B model at some 4-bit quant.

Why would marketing leave such low-hanging fruit on the tree?

They wouldn't.


You and nabla9 are both the one comparing apples and eggs. 4x more RAM means 4x larger models when everything else is held the same to make a fair comparison.

The economics don't make sense, each video is stored ~ once (+ replication etc. but let's say O(1)) but viewed n times, so server-side upscaling on the fly is way too costly and currently not good enough client-side.

Are you considering that the video needs to be stored for potentially decades?

Also shorts seem to be increasing exponentially... but Youtube viewership is not. So compute wouldn't need to increase as fast as storage.

I obviously don't know the numbers. Just saying that it could be a good reason why Youtube is doing this AI upscaling. I really don't see why otherwise. There's no improvement in image quality, quite the contrary.


> Then, once that is perfected, they will offer famous content creators the chance to sell their "image" to other creators, so less popular underpaid creators can record videos and change their appearance to those of famous ones, making each content creator a brand to be sold.

I'm frightened by how realistic this sounds.


Yes I don't understand how they can claim it's optimized for legibility when the base font does the inverse.


Gitless is this fork https://marketplace.visualstudio.com/items?itemName=maattdd.... it's not updated but still works well.


pyrefly is not tied to vscode? Also please try to be more considerate of people preferences, and pycharm is not strictly better. Remote dev on vscode is very convenient for me, should I go on the Internet saying that pycharm is trash? No


I’m not saying VS Code is trash but I think it’s closer to a text editor than an IDE. I even use it for some things non python but I remain curious to the fact why people use it for python.

It might not be tied to VS Code but the title clearly says “New […] IDE experience for…” which is why I commented. I had hoped to see something for PyCharm or even a new IDE.


Thats what you think, other people think something else.


But it is, as long as the positional embedding are sufficient, i.e. use relative positional embeddings here.


But do you understand it will harm Hollywood? This is the economic, country scale equivalent of saying "fuck your movies", do you think the answer will be "oh sorry, I'll keep buying yours" or "fuck your movies too"?


It will reduce the profit margins of Hollywood studios. Maybe reduce the bonuses of their CEO.

It'll help thousands of Americans that used to be employed by said studios but were fired not because they couldn't do the job but because the greed of studio execs made them move production outside of the US, playing financial arbitrage games against the interest of the US.

Trump is trying to reverse this by making movies outside of the US, especially by US companies, more expensive. The goal here is not some statement about Hollywood but to bring movie production jobs back to US. The extras, the people who build sets, people who make food for actors etc. And jobs bring tax revenue and grow GDP, which helps every american.


It’s categorically impossible that this could bring movie production jobs to the US. If a policy along these lines is implemented, both foreign and domestic consumers will launch a massive retaliatory boycott. (I will personally begin that boycott today unless Hollywood executives denounce the proposal.) The current dominance of American media offers absolutely no leverage, because it’s trivial and cost-free to go cold turkey on it.


This kills the movies.


You really think CEOs will choose to reduce their bonuses to help thousands of common Americans?


ChatGPT free gets it right without reasoning mode (still explained some steps) https://chatgpt.com/share/6810bc66-5e78-8001-b984-e4f71ee423...


The first sentence of the introduction ends with "we introduce Dynamic-Length Float (DFloat11), a lossless compression framework that reduces LLM size by 30% while preserving outputs that are bit-for-bit identical to the original model" so yes it's lossless.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: