Hacker News new | past | comments | ask | show | jobs | submit login
Nvidia H100 GPUs: Supply and Demand (llm-utils.org)
227 points by tin7in on Aug 1, 2023 | hide | past | favorite | 160 comments



The real gut-punch for this is a reminder how far behind most engineers are in this race. With web 1.0 and web 2.0 at least you could rent a cheap VPS for $10/month and try out some stuff. There is almost no universe where a couple of guys in their garage are getting access to 1000+ H100s with a capital cost in the multiple millions. Even renting at that scale is $4k/hour. That is going to add up quickly.

I hope we find a path to at least fine-tuning medium sized models for prices that aren't outrageous. Even the tiny corp's tinybox [1] is $15k and I don't know how much actual work one could get done on it.

If the majority of startups are just "wrappers around OpenAI (et al.)" the reason is pretty obvious.

1. https://tinygrad.org/


I'd argue that you really don't need 1000+ H100s to test things out and make a viable product.

When I was at Rad AI we managed just fine. We took a big chunk of our seed round and used it to purchase our own cluster, which we setup at Colovore in Santa Clara. We had dozens, not hundreds, of GPUs and it set us back about half a million.

The one thing I can't stress enough- do not rent these machines. For the cost of renting a machine from AWS for 8 months you can own one of these machines and cover all of the datacenter costs- this basically makes it "free" from the eight month to three year mark. Once we decoupled our training from cloud prices we were able to do a lot more training and research. Maintenance of the machines is surprisingly easy, and they keep their value too since there's such a high demand for them.

I'd also argue that you don't need the H100s to get started. Most of our initial work was on much cheaper GPUs, with the A100s we purchased being reserved for training production models rapidly. What you need, and is far harder to get, is researchers who actually understand the models so they can improve the models themselves (rather than just compensating with more data and training). That was what really made the difference for Rad AI.


I did choose the 1000+ H100 case as the outlier. But even what you are describing, $500k for dozens of A100s or whatever entry level looks like these days, is a far step away from the $10/month for previous generations. This suggests we will live in a world where VCs have even more power than they did before.

Even if I validate my idea on a RTX 4090, the path to scaling any idea gets expensive fast. 15k to move up to something like a tinybox (probably capable of running 65B model but is it realistic to train or fine-tune 65B model?). Then maybe $100k in cloud costs. Then maybe $500k in research sized cluster. Then $10m+ for enterprise grade. I don't see that kind of ramp happening outside well-financed VC startups.


Not every company is OpenAI. What OpenAI is trying to do is solve a generic problem, and that requires their models to be huge. There's a ton of space for specialized models though, and those specialized ones still outperform the more general one. Startups can focus on smaller problems than "solve everything, but with kind of low quality". Solving one specific problem well can bring in a lot of customers.

To put it another way, the $10m+ for enterprise grade just seems wrong to me. It's more like $10m+ for mediocre responses to a lot of things. Rad AI didn't spend $10m on their models, but they absolutely are professional grade and are in use today.

I also think it's important to consider capital costs that are a one time thing, versus long term costs. Once you purchase that $10m cluster you have that forever, not just for a single model, and because of the GPU scarcity right now that cluster isn't losing value nearly as rapidly as most hardware does. If you purchase a $500k cluster, use it for three years, and then sell it for $400k you're really not doing all that bad.


That is a decent point, in that it reminds me of a startup that posted on HN a couple of months ago that did background removal from images using AI models. They claimed this was a mature market now where bulk pricing was bringing the cost down to some marginal over the price of compute. I suspect those kinds of models are comparatively small compared to the general intelligence LLMs we are seeing and might reasonably be trainable on 250k clusters. There is likely a universe of low-hanging fruit for those kinds of problems and those who are capable. That is definitely not a market I would want to compete in since once a particular problem is sufficiently solved then it becomes a race to the bottom in cost.

But my (totally amateur and outsider informed) intuition is that the innovative work will still happen at the edge of model size for the next few years. We literally just got the breakthroughs in LLM capabilities around the 30b parameter mark. These capabilities seemed to accelerate with larger models. There appears to be a gulf in the capabilities from 7B to 70B parameter LLMs that makes me not want to bother with LLMs at all unless I can get that higher level performance of the massive models. But even if I did want to play around at 30B or whatever I have to pay 15k-100k.

I think we are just in a weird spot right now where the useful model sizes for a large class of potential applications is at a price point that many engineers will find prohibitively expensive to experiment with on their own.


For the first example, I think that was just due to the specific problem being solved. I can tell you there are a ton of places that aren't yet "solved" yet, and that aren't trivial to solve either. One thing we haven't discussed in this conversation is the data itself, and cleaning up that data. Rad AI probably spent more money on staff cleaning up data than they did on model training. This isn't trivial- for medical grade stuff you need physician data scientists to help out, and that field has only really existed since 2018 (which was the first time the title was listed in any job listing). The reason background removal is "mature" is because it's not that hard of a problem and there's a good amount of data out there.

I also think that you're way off on the second point. I'm not saying that to be rude, because it does seem to be a popular opinion. It's just that if you read papers most people publishing aren't using giant clusters. There's a whole field of people who are finding ways to shrink models down. Once we understand the models we can also optimize them. You see this happen in all sorts of fields beyond "general intelligence"- tasks that used to take entire clusters to run can work on your cell phone now. Optimization is important not just because it opens up more people to work on things, but also because it drops down the costs that these big companies are paying.

Lets think about this in another direction. ML models are based off of how the brain is thought to work. The human brain is capable of quite a bit, but it uses very little power: about 10 watts. It is clearly better optimized than ML models are. That means there's a huge gap we still have to fit on efficiency.


> It's just that if you read papers most people publishing aren't using giant clusters.

There is a massive difference between what is necessary to prove a scientific thesis and what is necessary to run a profitable business. And what do you mean "giant clusters" in this context. What is the average size of the clusters used in ground breaking papers and what is their cost? Is that cost a reasonable amount for a boot-strapped startup to experiment with or are we getting into the territory where only VC backed ventures can even experiment?

> There's a whole field of people who are finding ways to shrink models down

Of course the cost of running models is going to come down. The literal article we are responding to is a major part of that equation. You seem to be making arguments about how the future will be as support for an argument against how the present is.

Presently, hardware costs are insanely high and not coming down soon (as per the article). Presently, useful models for a large set of potential applications require significant cluster sizes. That makes it presently difficult for many engineers to jump in and play around.

My opinion is that the cost has to come down to the point that hobbiest engineers can play with the high-quality LLMs at the model sizes that are most useful. That doesn't imply that there are no model sizes for other use-cases that can't be developed today. It doesn't imply that the price of the hardware and size of the models will not fall. It just implies that dreaming of a business based around a capable LLM means your realistic present day costs are in the 10's of thousands at a minimum.


>What you need, and is far harder to get, is researchers who actually understand the models so they can improve the models themselves

Serious question: Where does an aspiring AI/ML dev get that expertise. From looking at OMCS I'm not convinced even a doctorate from Georgia Tech would get me the background I need...


Everyone I've met with these skills has either a masters degree or a PhD. I do know several people who got their PhD earlier in their careers who are really into AI now, but they had the foundational math skills to keep current as new papers were published.

I can't tell you if one program is better than another, as it's a bit out of my area of expertise.


The foundational math skills are linear algebra, calculus, and statistics. They are bog standard math anyone with a university education in the sciences should be comfortable with. The only math that's possibly more obscure are the higher level statistics tricks like graphical models, but those can be picked up from a textbook.


I think the skill is on how to ingest science paper, choosing which paper worth time to read, etc...

These skills are developed when they study phd


Bill Gates knew how to do that and he dropped out not even halfway into his undergrad.


Your (ex)-company literally has the name AI in it, so yea, it makes sense to buy compute, not rent.

That said, a lot of other businesses don't want to take on the capex, but they do need to train some models... and those models can't run on just a half a million worth of hardware. In that case, someone else is going to have to do it for you.

It works both ways and there are no absolutes here.


I've found most larger companies are more concerned about opex than capex. Large companies aren't going to have much of an issue there.

My response was more for these folks the OP mentioned-

> There is almost no universe where a couple of guys in their garage are getting access to 1000+ H100s with a capital cost in the multiple millions.

I'm pointing out that this isn't true. I was the founding engineer at Rad AI- we had four people when we started. We managed to build LLMs that are in production today. If you've had a CT, MRI, or XRay in the last year there's a real chance your results were reviewed by the Rad AI models.

My point is simply that people are really overestimating the amount of hardware actually needed, as well as the costs to use that hardware. There absolutely is a space for people to jump in and build out LLM companies right now, and the don't need to build a datacenter or raise nine figures of funds to do it.


> I've found most larger companies are more concerned about opex than capex.

Another absolute. I try to not be so focused on single points of input like that.

From what I can tell, sitting on the other side of the wall (GPU provider), there is metric tons of demand from all sides.


keep in mind that if you're a bigger customer, aws discounts are huge (often >50% off of sticker). if the payback were 16 months instead of 8, it becomes a much tougher sell (esp with GPUs improving rapidly over time)


AWS does not offer great discounts on the GPUs at this point, as they don't have nearly enough of them to meet demand. I'm no longer at a startup, and have worked at a couple of larger companies.

That said I'm mostly responding to the "two guys in a garage" comment with this. Larger companies are going to have different needs altogether.


I wouldn't spend a single dollar on George.

The guy could wake up tomorrow and decide he didn't feel like developing this stuff any more and you're going to be stuck with a dead project. In fact, he already did that once when he found a bug in the driver.

People RIP on Google for killing projects all the time and now you want to bet your business on a guy who livestreams in front of a pirate flag? Come on.

Never mind that even in my own personal dealings with him, he's been a total dick and I'm far from the only person who says that.


what are you talking about. George has been working on comma.ai for years. It's shipping actual products and has revenue.

We need more people who "think different" and push back against the status quo instead of carrying out ad hominem attacks on public forums.


They're talking about the meltdown he had on stream [1] (in front of the mentioned pirate flag), that ended with him saying he'd stop using AMD hardware [2]. He recanted this two weeks after talking with AMD [3].

Maybe he'll succeed, but this definitely doesn't scream stability to me. I'd be wary of investing money into his ventures (but then I'm not a VC, so what do I know).

[1] https://www.youtube.com/watch?v=Mr0rWJhv9jU

[2] https://github.com/RadeonOpenCompute/ROCm/issues/2198#issuec...

[3] https://twitter.com/realGeorgeHotz/status/166980346408248934...


If his grievances actually got through to the AMD CEO, I'd say he's already had a bigger impact than most.


It is good press at a time when AMD is very visibly looking like they lost out on the first round of AI. His project is open source, AMD likes that and can benefit from the free developer feedback. I'd say this is less about George and more about Lisa being the insanely smart and talented business person that she is.


Lisa is smart but this only got to her because George got publicly upset.


He emailed her after he had his meltdown. It wasn't like she saw the meltdown and wrote to him. He is nowhere on her radar.

By the way, I also got a bug in the AMD drivers fixed too [0]. That bug fix enabled me to fully automate the performance tuning of 150,000 AMD gpus that I was managing. This is something nobody else had done before, it was impossible to do without this bug fix. We were doing this by hand before! The only bummer was that I had to upgrade the kernel on 12k+ systems... that took a while.

I went through the proper channels and they fixed it in a week, no need for a public meltdown or email to Lisa crying for help.

[0] https://patchwork.freedesktop.org/patch/470297/?series=99134...


It is certainly possible to "think different" and not be a wannabe steve jobs.


1) This is just what happens when an industry matures. If you want to start a new company to drill oil wells, you're going to spend a lot of money. Same if you're starting a new railroad, a new car company, a new movie studio...

2) Speaking of VPSes and web 1.0 in the same breath is a little anachronistic. Servers had much lower capacity in 1999, and cost much more. Sun was a billion dollar company during the bubble because it was selling tens of thousands of unix servers to startups in order to handle the traffic load. Google got a lot of press because they were the oddballs who ran on commodity x86 hardware.


> I hope we find a path to at least fine-tuning medium sized models for prices that aren't outrageous

It's not that bad; there are lots of things you can do with a hobbyist budget. For example, a consumer GPU with 12 or 24 GB VRAM costs $1000-2000 and can let you run many models and do fine-tuning on them. The next step up, for fine-tuning larger models, is to rent an instance on vast.ai or something similar for a few hours with a 4-8 GPU instance, which will set you back maybe $200—still within the range of a hobbyist budget. Many academic fine-tuning efforts, like Stanford Alpaca, cost a few hundred dollars to fine-tune. It's only when you want to pretrain a large language model from scratch that you need thousands of GPUs and millions in funding.


The question is what happens once you want to transition from your RTX 4090 to a business. It might be cute to generate 10 tokens per second or whatever you can get with whatever model you have to delight your family and friends. But once you want to scale that out into a genuine product - you're up against the ramp. Even a modest inference rig is going to cost a chunk of change in the hundreds of thousands. You have no real way to validate your business model without making some big investment.

Of course, it is the businesses that find a way to make this work that will succeed. It isn't an impossible problem, it is just a seemingly difficult one for now. That is why I mentioned VC funding as appearing to have more leverage over this market than previous ones. If you can find someone to foot the 250k+ cost (e.g. AI Grant [1] where they offer 250k cash and 350k cloud compute) then you might have a chance.

1. https://aigrant.org/


You can use a lower performance model, you can use one LLM-as-a-service, etc.

If you want to compete on the actual model, then yes, this is not the time for garage shops.

If your business plan is so good, then it will work without H100 "cards" too, or if it's even better and you know it'll print money with H100 cards then great, just wait.


There was a period in the 90’s when it was necessary to raise money and assemble a team just to make web products. Frameworks didn’t exist, we didn’t have the patterns we do now, everything was built for the first time and as such was 100% custom. The time of $10 VPS’s came much later.


You're comparing apples to oranges.

Should I complain that to drill oil I need hundreds of millions of dollars to even start?

Your VPS example was doing barely any computation. You're conflating web 1.0 and web 2.0 with neural networks and they are nothing alike in terms of FLOPS.


I want a tinybox so bad.


>Who is going to take the risk of deplying 10,000 AMD GPUs or 10,000 random startup silicon chips? That’s almost a $300 million investment.

Ironically, Jensen Huang did something like this many years ago. In an interview for his alma mater, he tells the story about how he had bet the existence of Nvidia on the successful usage of a new circuit simulation computer from a random startup that allowed Nvidia to complete the design of their chip.


> >Who is going to take the risk of deplying 10,000 AMD GPUs or 10,000 random startup silicon chips? That’s almost a $300 million investment.

Lumi: https://www.lumi-supercomputer.eu/lumis-full-system-architec...


LUMI as an "AI customer" has a:

- low-budget: tax payer supercomputer for tax payer phd students

- high-risk tolerance: tolerate AI cluster arriving 5 years late (Intel and Aurora), lack of AI SW stack, etc.

- High FP64 FLOPs constraint: nobody doing AI cares about FP64

Private companies whose survival depend on very expensive engineers (10x EU phd student salary) quickly generating value from AI in a very competitive market are completely different kind of "AI customers".


Absolutely. We could definitely chalk this up to being the "exception that proves the rule".


AMD GPUs are relatively well tested. Anybody who's looked at nvidia's architecture could tell you it's not perfect for every application. Similarly AMD's isn't either.

If you know what your application would be and have the $300 million custom chips may be way more wise. Something you'd only get if you make things in-house/at startups.


For which applications are AMD GPUs more suited? Last I looked at the available chips, AMD sometimes had higher FLOPS or memory throughput (and generally lower cost), but I don't recall any qualitative advantages. In contrast, just to pick something I care about, NVIDIAs memory and synchronisation model allows operations like prefix sums to be significantly more efficient.


They may have had an edge on 64-bit performance, which is pretty much useless for deep learning, but can be useful e.g. physics simulations or other natural science applications.


> but I don't recall any qualitative advantages

Like... how you feel when you use them? (-:


Oh definitely, there's a reason my home GPU is an AMD! The fact that driver troubles are (sort of) a thing of the past is a great win.


AMD has won many contracts for supercomputers no doubt due to their lower pricing. But there’s a good reason why no one is buying them in droves for AI workloads.

Also:

> For visualization workloads LUMI has 64 Nvidia A40 GPUs.


The reason is simple... AMD has been focused on the gaming market and got caught with their pants down when this AI thing happened. They know it now and you can bet they will do whatever it takes to catch back up. The upcoming MI300 is nothing to sneeze at.


Hewlett Packard Enterprise (HPE) is not a random startup.


Honestly, that's a really excellent point.

Successful startups are successful because they do exactly that. Successfully.


Risk / reward though. What's the reward? Are the alternatives better than Nvidia? Doesn't seem so and the risks are significant.

On the other hand if you can't get H100s then nothing to lose!


Successful startups cater to specific customers and provider one on one care that isn't typically possible with larger companies.


Do these chips actually cost this much or they just carry very high markup?

I can't imagine the GPU would cost more than $100 at scale, unless they have extremely poor yields.


nVidia might be in a special situation here, so maybe they do sell at list price, but normally all of this stuff comes at deep, deep discounts.


Do you recall the name of the startup?


It could be something like i-codes.

It's mentioned about 2 minutes into this video [1].

[1] https://www.youtube.com/watch?v=tJTp-3rtkYQ



What nobody is talking about here is that there is no more power available in the US. All the FAANGS have scooped up the space and power contracts.

You can buy all the GPUs you can possibly find. If you want to deploy 10MW+, it just doesn't exist.

These things need redundant power/cooling, real data centers, and can't just be put into chicken farms. Anything less than 10MW isn't enough compute now either for large scale training and you can't spread it across data centers because all the data needs to be in one place.

So yea... good luck.


There are datacenters that are specializing in this, and they exist today.

I highly recommend Colovore in Santa Clara. They got purchased by DR not too long ago, but are run independently as far as I can tell. Their team is great, and they have the highest power density per rack out of anyone. I had absolutely no problem setting up a DGX cluster there.

https://www.colovore.com/


Thanks, will call them. I doubt it is in the ranges we need though.


With things like solar and wind installs becoming more off-the-shelf, is there any path there? What does 10MW of solar/wind look like? Are we talking the size of a big ranch or the size of a small county?


It doesn't matter, this stuff wants/needs to be deployed yesterday, not in 2-3 years. I'm starting with raw power, but even that isn't the limiting factor... it goes deeper... try to buy a bunch of large transformers, those are year long waitlists.

Texas has a lot of wind. At this scale, it is mostly grid power anyway. Grid is a mixture of everything. Oh and solar has this pesky issue of not working in the evening, so then you have another problem... storage. ;-)

I should add... you want backup generators for your UPS systems? Those are a 4.5 years backlog.


With AI training, unlike a lot of datacenter work, you have the option to "Make models when the sun shines" and just turn things off at night. That'll push you towards cheaper but less power efficient older node compute but I think the business case should work.

Right now in the US there's about as much proposed renewable production planned and awaiting permitting as their is currently installed. It's the grid connections that are the long pole in expanding renewable use right now. And since the voltage that a solar panel outputs is pretty close to the voltage a GPU consumes you've got some more savings there.

There are still a lot of challenges with that but in general I think people should be looking for ways to collocate intermittent production of various things with solar farms right now, from AI models to amonia.


Is battery storage not yet commonplace? There are gobs of options: pump water uphill, spin giant flywheels, etc. Picking a battery with the right tradeoffs for your situation is a crucial consideration, I would think. And I am a subject matter expert here, having played several hours of Cities: Skylines in my day. Which gives me an idea...

Let's click 6 wind turbines down off the coast, shove our H100s underneath them for water cooling, and ah...separate the water/oxygen into tanks for hydrogen power when it ain't blowy no more? Or something? Someone help me out here.


Grid-scale batteries are basically nonexistent in the US, but also aren't particularly common elsewhere. In 2016 there was only 160mW [0] of battery storage available to the grid. Battery prices have come down since then, but not enough for energy storage to make sense for utilities in a lot of cases. If capacity has doubled in the past seven years, the person you're responding to would still be asking for like 3% of available battery capacity nationwide.

As far as other storage methods, they're really cool but water and trains require a lot of space, and flywheels typically aren't well suited for storing energy for long amounts of time. That being said, pumped water is still about 10x more common than batteries right now and flywheels are useful if you want to normalize a peaky supply of electricity.

I'd like to believe we'll see more innovative stuff like you're suggesting, but I think for the time being the regulatory environment is too complicated and the capex is probably too high for anyone outside of the MAMA companies to try something like that right now.

[0] - https://www.energy.gov/policy/articles/deployment-grid-scale...


That is a "chicken farm". Nobody is going to deploy tens of millions in GPUs to something like that, let alone run their tens of millions worth of training data on that.


I'm sort of assuming this is like most other booms and will last several years. Sure the best time to be ready to jump in was yesterday, the the second best time is to get the ball rolling now even if your lead time is years.

And yes obviously renewables won't cover 24/7 but if I have a choice between no data center and a 60% time data center.. give me the 60%.


I've been running large scale GPU deployments (>150k+), for years now... we are absolutely pivoting to this (yes, we took a bit too long).

But it is surprisingly hard to find investors who are willing to wait, even though we know that this stuff is going to last for decades.

username@gmail if anyone would like to have real conversations about this.


10MW of wind is a serious installation with dozens of wind turbines, if I recall correctly.



A single wind turbine is typically ~3 MW max output with an average output a little less than half that.


That's only on paper. In reality its much less than that in output.


There's still space, but AI startups are doing the scooping. At least when they fail there will be some nice pre-built datacenter cages for people to move into.


Ping me if you have references and I'll follow up on them immediately. Thanks. username @ gmail.


I have never had to even think about the steps a firm with massive utility requirements would need to take to secure supply. So assuming you could wave a magic wand and instantly build out a datacenter in northern Virginia right now, the local power utility (Dominion Energy in this case) would not be able to provide power?


It isn't like you can snap your fingers and magically transport 10MW+ of power to your doorstep. Plus, as I said in other threads, it isn't just power... it is everything around supporting that power. Try ordering a transformer. Or getting EPA approval to install backup generators.


This is true, Tesla took most of the remainder.

We took 30 MW outside the US but also some inside the US


Its really time for some competition. Either AMD or some chinese company like 'more threads' need to speed up and get something on the market to break the Nvidia dominance. Nvidia is showing already some nasty typical evil behavior that has to be stopped. I know not easy with fully booked partners at Samsung/tmct/etc


https://tinygrad.org is trying something around this; currently working on getting AMD GPUs to get on MLPerf. Info on what they're up to / why is mostly here - https://geohot.github.io/blog/jekyll/update/2023/05/24/the-t... - though there are some older interesting bits too.


AMD / Intel should be throwing them 10's or 100's millions (or jsut straight up human hours of work) to make that work. If / when it does it would be 10's or 100's of billions of (additional) market cap for them.


So why doesn't AMD invest more in quality software? Everybody knows it's holding them back. Right now a tiny corporation has to jump through hoops to do work they could have done at any point in the last decade. Why don't they pull a great team together and just do the work, if the pay off is that big?


I honestly do not know.

Perhaps they think using GPUs for computation is a passing fad? They hate money? Their product is actually terrible and they dont want to get found out (that one might be true for intel)?


In general it's pretty rare for hardware first companies to put out good software. To me it looks like there are structural reasons for this, hardware requires waterfall development which then gets imposed on software, for instance.


They are. FSR tests side by side with DLSS most people can’t tell the difference or pick FSR. Then you tell them which is which and they turn around and say DLSS is better. People are just bias to Nvidia.

They haven’t had gfx card driver issues in years now and people still say “oh I don’t want AMD cos their drivers don’t work”.


FSR is vastly inferior to dlss, not sure what you're talking about, even xless from Intel is better.

As for driver: https://www.tomshardware.com/news/adrenalin-23-7-2-marks-ret...


I’m not sure what you’re talking about because side by side testing people can’t tell the difference with the exception of racing games (tho that’s screwed on DLSS 3 too anyway) and if you take screen grabs to look at. So fact is. All the compute in Nvidia cards is a gimmick. If you disagree then you’re wrong. The competitive edge of DLSS is gone.

1 bad driver update is not indicative of anything. Nvidia has had bad driver updates but you’re not shutting all over them. And running Nvidias own drivers on linux is still a pain point.

(And don’t try claim I’m an AMD fanboy when I don’t even have any AMD stuff at the moment. It’s all Intel/Nvidia)


FSR is pretty bad, like it's not even close to DLSS, no one like fsr. And saying that there is not difference is wrong, just play a game with fsr 2.1 and dlss 2 or 3 please.


So you drank the Nvidia coolaid. That’s fine.


I have an AMD card, 6800xt, really good card but fsr is not there yet.


I have a 4070. FSR/DLSS on quality looks the same. It’s only noticeable in Forza Horizon. If you notice it in a non racing game then you’re looking for the differences.


This thread isn't about gaming, it's about compute.


I replied to a comment about software.


Can I run pytorch on it?


Yes, but you need ROCm which mostly only runs on AMD's professional cards and requires using the proprietary driver rather than the wonderfully stable open source one.


ROCm only officially supports a handful of server or workstation cards, but it works on quite a few others.

I've enabled nearly all GFX9 and GFX10 GPUs as I have packaged the libraries for Debian. I haven't tested every library with every GPU, but my experience has been that they pretty much all work. I suspect that will also be true of GFX11 once we move rocm-hipamd to LLVM 16.


Have you tried Star Citizen? It’s why I didn’t buy AMD


Based on star citizen telemetry I don’t know what you have against AMD. It seems to rank the 6800/6900 quite high.


Intel is very much putting its money where its mouth is with SyCL/OneApi. They are spending a lot of money and advancing a lot faster than AMD, and in many ways it's a better approach (focused on CUDA-style DSL but portable across hardware) rather than just another ecosystem.

(to their credit AMD is also getting serious lately, they put out a listing for like 30 ROCm developers a few weeks after geohot's meltdown, and they were in the process of doing a Windows release (previously linux-only) of ROCm with support for consumer gaming GPUs at the time as well. The message seems to have finally been received, it's a perennial topic here and elsewhere and with the obvious shower of money happening, maybe management was finally receptive to the idea that they needed to step it up.)


Didn't they abandon development of AMD related software?


Where did you get that? They just received a shitload of GPU's and it appears AMD is actively cooperating: https://twitter.com/realGeorgeHotz/status/168616581138659737...


I don't remember the exact tweet, but here's one discussion [1]. I guess something changed in the mean time.

[1]_https://www.reddit.com/r/Amd/comments/140uct5/geohot_giving_...


[1] links to https://github.com/RadeonOpenCompute/ROCm/issues/2198 which has all the context (driver bugs, vowing to stop using AMD, Lisa Su's response that they're committed to fixing this stuff, a comment that it's fixed)


He had abandoned them for about a week but then talked to the CEO and that got things back on track IIRC


George Hotz: because eventually Elon Musk's false promises and over the top shenanigans just weren't doing it for you anymore...


(I'm the author of the linked post)

Yes, much needed.

Here's a list of possible "monopoly breakers" I'm going to write about in another post - some of these are things people are using today, some are available but don't have much user adoption, some are technically available but very hard to purchase or rent/use, and some aren't yet available:

* Software: OpenAI's Triton (you might've noticed it mentioned in some of "TheBloke" model releases and as an option in the oobabooga text-generation-webui), Modular's Mojo (on top of MLIR), OctoML (from the creators of TVM), geohot's tiny corp, CUDA porting efforts, PyTorch as a way of reducing reliance on CUDA

* Hardware: TPUs, Amazon Inferentia, Cloud companies working on chips (Microsoft Project Athena, AWS Tranium, TPU v5), chip startups (Cerebras, Tenstorrent), AMD's MI300A and MI300X, Tesla Dojo and D1, Meta's MTIA, Habana Gaudi, LLM ASICs, [+ Moore Threads]

The A/H100 with infiniband are still the most common request for startups doing LLM training though.

The current angle I'm thinking about for the post would be to actually use them all. Take Llama 2, and see which software and hardware approaches we can get inference working on (would leave training to a follow-up post), write about how much of a hassle it is (to get access/to purchase/to rent, and to get running), and what the inference speed is like. That might be too ambitious though, I could see it taking a while. If any freelancers want to help me research and write this, email is in my profile. No points for companies that talk a big game but don't have a product that can actually be purchased/used, I think - they'd be relegated to a "things to watch for in future" section.


Gaudi2 and Inferentia2 are both good.

We train on A100s, TPUs and... other things now.


Also missed in the post is fp8 is really much more efficient

The H100s are actually very good for inference..


Give it time. AMD, AWS trainium/inferentia, and Google TPUs all compete here. The gap is mostly with software drivers/support.


It's weird not more is made of the fact the Google's TPUs aren't the only real, shipping, credible alternative to NVidia.

I wonder how much a TPU company would be worth if Google spun it off and it started selling them?


Google kind of has done this with Coral: https://coral.ai/about-coral/

These TPUs obviously aren't the ones deployed in Google's datacenters. That being said, I'm not sure how practical it would be to deploy TPUs elsewhere.

Also, Amazon's Infinera (sp?) gets a fair bit of usage in industrial settings. It's just that these nvidia GPUs offer an amazing breeding ground for research and cutting edge work.


The Coral line of products targets embedded/IoT inference applications, and are on the low end as far as processing power goes. AFAICT the recent surge of demand, mostly fueled by LLMs, is for GPUs on the opposite end of processing power and RAM size.


(opinions are my own)

https://coral.ai/products/


Having worked with Quants before, the reality is that however big your compute farm, they will want more. I think this is what is going on with these large AI companies - they are simply utilising all of the resource they have.

Of course they could do with more GPUs. If you gave them 1,000x their current number, they'd think up ways of utilising all of them, and have the same demand for more. This is how it should be.


There must be some point at which the cost of the endeavor becomes greater than the profits to be made.


Absolutely! But thats for management to work through, the engineers just want more :)


From what I've seen the utilisation is still pretty poor, they're being used so poorly and most companies can get away with less GPUs. Instead of looking at how to optimise their workflow they just slap GPUs on


I’ve noticed the same. Very low utilization. But they are all used at peak every few weeks. For many companies having more GPU cost to unblock velocity of innovation here is worth it. As the benefits of improvement to your top line revenue far exceeds the GPU cost.


Are you talking poor utilization from a "we have a bunch of GPUs sitting idle" sense, or poor utilization from a performance standpoint (can't keep their opus fed with data kinda thing)?


Kinda both honestly, but more of the fact that code isn’t great for optimising GPUs at run time. For instance no batching, or not realising the CPU is actually the bottleneck


I think the author has missed a pretty large segment of demand. Non-cloud/non-tech enterprises are also buying large quantities of H100s and A100s for their own machine learning and simulation workloads. Where I work, we are going to have more than 1000 H100s by the end of the year, I am very excited to start benchmarking them soon :)


I agree. (I'm the author) Touched on that briefly here https://news.ycombinator.com/item?id=36955403. Need help with that research; please email - email is in profile. Had a section on it in early drafts; didn't feel confident enough; removed it.

Would be good to have more on enterprise companies like Pepsi, BMW, Bentley, Lowes, as well as other HPC uses like oil and gas, others in manufacturing, others in automotive, weather forecasting.


Just curious, what price do Nvidia charge enterprises? For $40k, it just doesn't make sense to buying it compared to renting from lambda labs or some other place for $2/hour(or $17k/year).


That’s a good question and I can’t give a straight answer, because we are actually on three year lease plans through a vendor (think HPE). I know we get a pretty decent deal in general from them since we have basically an entire data center worth of hardware from them, but I don’t know how good the deal is on the GPUs, since they’re in such high demand.


It's weird that this article ignores the entire traditional HPC market/DoE/DoD demand for H100s.


(Author here) I'd be interested in writing about this in the future. I need help though because I don't know people in those spaces. Email is in my profile. I had a section on this in early drafts but removed it as I didn't feel confident enough in my research.


If the bottleneck isn’t TSMC wafer starts but CoWoS, where exactly does that bottleneck come from? From what I understand, its the interposer connecting GPU and HBM wafers. Are they hard to make, is the yield bad, are there insufficient production lines, …?


Nah cause AMD could be being used if they had software. Also Intel is totally different fabs and wafers, but both are not close in software to Nvidia


Is Nvidia even able to capture a proportionately significant amount of revenue from increases in demand for GPU cycles? As the article describes, there are real bottlenecks, but how does this play out? My assumption is that Nvidia doesn't have proportional pricing power for some reason. If demand increases 10x, they can't raise prices to the same extent (correct me if I'm wrong).

How would that even play out then? Is everyone in the world simply stuck waiting for Nvidia's capacity to meet demand?

There is obviously a huge incentive now to be competitive here, but is it realistic that anyone else might meaningfully meet demand before Nvidia can?


Their prices are already high enough :) Base price of the H100 is something like $36000 USD.


A a ballpark you can guess ~100x the margin of gaming GPUs since the transistor counts and architecture are similar enough to high end gaming GPUs that sell for a fraction of the price and the extra RAM isn't going to close that gap.

When you have 100x the margin on a product you only need to sell 1% as many to almost double profit. 10% would be more than 10x profit.


Very good article ! Nice insight as to who/what/where and how much :)


I appreciate it, thanks for the comment


By far the biggest issue it utilisation of the GPUs, if people worked on that instead of throwing more power at problems this would be way less of a problem


And you are under the impression that people are not working on that?


I know people are working on it - but IMO it's not spoken about anywhere near as much as slapping more GPUs on a problem. I'm on about enterprise workload orchestration not just making a model faster


It's spoken about all over github and the open source AI ecosystem.

HN and casual media are heavily attention biased towards big money and big datacenters, yes.


how good is "historically" the open source AI ecosystem?

usually community projects have different target constraints, right? (for example Mastodon doesn't even want to be Twitter, it wants its own thing, it wants to be a different answer to the social network/media question, even if there are obvious and fundamental similarities.) how does this play out in the open source AI communities?


The discussion on how different industries are vying for the same limited supply of components adds another layer of complexity to the situation. It's intriguing to see how GPUs are being sought after by diverse sectors, and this phenomenon showcases the versatility and importance of these components in modern technology.


This is a very high quality writeup.


Thank you


Jenson could write one of the clouds a license to use 4090s in a DC and make this crunch disappear overnight (would be rough for gamers though)


4090s have 24GB of 384-bit-wide GDDR6 with no ability to interconnect that memory to other 4090s except thru PCIe bandwidth.

H100s have 80GB of 5120-bit HBM with SXM NVLink for 8-at-a-time in a rack.

HUGE difference in bandwidth when doing anything where the inferring the model needs to be spread over multiple GPUs, which all LLM's are. And even more of a difference when training is in play.


There's A6000 Ada for that (you can rent servers with 4xA6000 at Lambda Labs). Moreover, 4090 has only 24GB memory, H100 has 80GB.


4090 (and all consumer chips of its class) have terrible efficiency and are not suitable for use in a DC.


I know it’s off topic, forgive me HN Gods, but this was right at the top of the article and threw me off:

> Elon Musk says that “GPUs are at this point considerably harder to get than drugs.”

Does Elon have a hard time getting drugs?


Sounds like no to me.


What is with this webpage, it appears to be blocked by pretty much every browser?


Works fine on Chrome/Mac - it is however a very strange article structure. It was obviously very well researched with a lot of interesting information but I found it very hard to read - Q/A style, hardly any paragraphs with more than a sentence, sections where the heading is longer than the content, a huge number of quotes.


FWIW, works fine on a MacOS Safari client with zero customization.


Works fine on Firefox on Android

Also works on Chrome on Android


It's working fine in Chrome on Windows.


This is a pretty awful article.

AFAICT it consists of a bunch of anecdotes by thought-leader types followed by a corny-ass song.

HN, you can do better. I believe in you. Try harder.


It is a well researched report on the GPU availability crisis. The song is funny, especially Mark riding a LLaMA and the cabaret dance.


Shockingly well-researched, even. I found it super informative towards understanding the current market. Really impressive aggregation of sources. I wish it had some authors listed so that I could see some of their other work.


Thanks! (Author here, see other work in my HN submissions and comments)


It looks like they didn't scroll past the embedded video, so they didn't even get to the table of contents.

>"HN, you can do better"

- indeed.


I appreciate it, and I'm glad you liked the song


I just forwarded this solid article to my colleagues. Did you perhaps miss the meat of the article that's after the video?

PS: The song is also very good.


I love a good lyric swap, I like the original song, and I couldn't make it through one minute of the song. Song is not good.


I appreciate the comment and am glad you liked the song.


I bet you listen to stuff like http://www.openbsd.org/lyrics.html


A lot of it is from people with first hand knowledge of buying or selling capacity/hardware.


If you scroll down past the corny song you'll find the table of contents for a very thorough article. The table sort of looks like footer text to the web page if you don't look too carefully, I almost missed it myself.


It has parts where I thought it's AI generated


that would explain a lot




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: