Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s not a direct apples to apples comparison, but compared to a RTX 4090 this isn’t compelling. Other than the 32GB of VRAM there’s not much going for this card. For $4k most people would be better off with dual 4090s.


It has a few things going for it:

1) Enterprise-grade design, with features like ECC. I like ECC.

2) VRAM is basically the limiting factor on running LLMs. On my GPUs, I look at cost versus VRAM. Fitting a model in one PC helps. Fitting a model on one GPU helps more.

3) It's just over half the power (250W versus 450W). The enterprise cards are clocked a little bit slower, which dramatically reduces power.

At least on the older Ampere cards, they also fit better. The consumer cards had giant cooling, which made sticking several in a computer more complex.


> The enterprise cards are clocked a little bit slower, which dramatically reduces power.

But you can (and should) undervolt RTX 4090s if you care about power efficiency. Anyone buying one would know how to.


Does that get them to 250 W?


Not OP, but you can limit the max wattage using nvidia-smi. Here's a good article with the instructions and how you would determine the best wattage.

https://betterprogramming.pub/limiting-your-gpu-power-consum...


That's really interesting, thanks! I didn't realize you could do this though just an option.


> I look at cost versus VRAM. Fitting a model in one PC helps. Fitting a model on one GPU helps more.

Yes unfortunately I dont see 128GB VRAM coming any time soon even with 512bit GDDR Memory Interface.


I would very, very gladly pay for a GPU which took normal DIMMs, and let me get up to 256GB.

I can buy a 32GB DIMM for <$50. Eight of them would do it for $400.

YES, I know the performance hit, but I'm not limited so much by performance as by which models fit. That's true of a lot of other people too.

Only way I can see this happening is if Intel or AMD get serious about AI. This would seriously undercut NVidia's business.


3090 also has NVlink. So in terms of the computer, 2x3090 with nvlink looks like one GPU.


The 6000 makes a lot more sense than the 5000.

Nvidia uses RAM limitations as non linear cost scaling vector in the same way openAI used context window but that opens opportunity for competition to disrupt long term.


Presumably the 32GB of VRAM is what makes it compelling, as you could cram some fairly substantial AI models on there.


You can split your AI models between two 4090s and have access to 48GB of VRAM for less money. You do take a slight performance hit compared to running on one card, but the 4090 also has more CUDA cores which help mitigate the slowdown.


Is it easy to split your model onto multiple gpus though?


How do you split your model to train on multiple gpus while pooling VRAM?


deepspeed


33% increase in VRAM over a consumer card at 150% increase in price?

If CUDA is 100% compliant on AMD we'll actually see competition in this space.


Yeah, because price is a deciding factor for corporations, lol


Have you worked at an enterprise? Everything is about money.

Don't forget the scale at which these operate. $1000 extra in a GPU adds up quickly when you buy a thousand of them.

Perhaps in big tech it's different but in our huge multinational anything IT is seen as a cost center only and has to be on the cheap. Which a real focus on short term savings because who knows if this year's SVP will still be in that seat next year.

Nobody ever told me "let's take the most expensive option because it's the best in the long run" :(


That's kinda what top parent comment is talking about though. Money doesn't matter when you only have one choice. Enterprise can ONLY buy Nvidia cards so money really doesn't matter in this case. Technical execution is more important than anything else, including money. If you can't afford or make profit to buy cards to support what you need (more VRAM in this case), the project is dead in the water to begin with.


In the long-term it is


Not really.

If a SWE salary is $200k, total cost is probably around $400k. A $4k GPU, if bought each year, increases SWE cost by 1-2%.

Buying a nice GPU was the best purchase my employer made, even if I use it every 6 months. It's already paid for itself many times over.


You should never compare an additional cost to the overall c ost in a production setting. You should compare it to the profit margin. If you are running a 10% margin, then a 2% cost increase is -20% profit, a quite substantial amount.


The key question is output per dollar.

If employees are more efficient, I need fewer of them. People cost a lot of money relative to capital expenses like nice laptops, good chairs, or individual GPUs.

This kind of spending almost always pays off very quickly.


Yes, I also believe that. The only thing I argue against is the 'negligeable' marginal cost statement fed to devs by dev gurus because it is a fallacy.

I'm all for investing in quality work experience. Decent equipment being the most obvious. Not only for the potential higher productivity, but also for its definite hr/retention effects. Investing in tangible production improvements and comfort is money spent 10x more effective than the hr semi-mandatory fun/social events that have metastasised throughout tech.

Where asking wasn't an outcome, in contrast to some other colleague that just went "I guess we will just deliver below potential then" I used to buy my own stuff because I respected my time, but it definitely didn't want to keep me working there.

Now this does not mean you have to shell out for every whim. If you want a 10.000€ PC setup instead of a 4.000€ one, I will need a better justification than 'custom loop and RGB'.


Let me be very clear: Because I have a fancy GPU in my computer, I was able to prototype algorithms in my spare time (e.g. nights and weekends), learn a lot more about the state of machine learning, and use this in production.

I can think of one specific use -- processing a data set -- which allowed me to do something automatically which would have used a few hours manual time. My hourly salary times hours saved on that one use already paid for the GPU.

I use Stable Diffusion many times to make images for presentations before it was mainstream (yes, there were a few months before everyone was using DALLE and SD). That had a noticeable impact which went well beyond the cost of the GPU.

And so on.

Yes, I could have done this at lower cost on cloud-based services with only a little bit more overhead, but:

1) I wouldn't have put in the extra work

2) If I did, I wouldn't have been able to justify the expense when things were at the half-baked stage

Things like Hugging Face are at a level where this DOES increase productivity by far more than it costs, and not in the abstract hr/retention effects way, but in terms of concrete output.

More critically, these tools are now part of our tool set. That means when new opportunities come up, we can leverage them.


Or 2x 3090s to make it even more cost effective.

If you're using these for purely inference (which is what the 5000 seems like it's tuned for) the vram's the real bottleneck so you get a similar bang for your buck using last gen's cards vs. the bleeding edge.

I've also found the crypto crash has dumped a bunch of these well-worn cards on the market you can pick up a bit cheaper.


It should bring pause to anyone considering buying an Apple Silicon device for inference, at least.

And while you're right, I personally have no interest in running two 4090s. My 24GB 3090s and RTX 5500s can feel a little meager at times, but 4090s are still overpriced and power hungry. If I'm moving up, it'll be for density, and a few hundred won't matter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: