I have to say that I would much rather invest in Groq at a $2.8b valuation than NVDA at a $2.55T valuation. Hard to imagine a world where those two numbers are both correct at the same time.
Nvidia has tons of other products, and they have been delivering AI accelators for many years now. Groq on the other hand is basically a gamble at this point.
In literature on the financial markets, authors routinely discuss how retail will decide to invest in companies in less time than it takes them to order lunch, or in this case write a comment on a forum.
This comment is the epitome of that considering how Groq has near zero revenue while Nvidia brings in close to $100b annually with over 50% profit margins.
This is how both numbers can be correct at the same time.
Yeah, except I worked for a decade doing investment management at large hedge funds. And I'm also very familiar with the product offerings of both companies and think Groq is on an extreme growth trajectory. NVDA is extremely overvalued right now based on optimism around AI compute. If that optimism is warranted, then Groq is likely to 10x or more from here in the next few years.
There’s no comparison between those two though. You can go and get an RTX 3060 with $300 and run Llama3 8B. To do the same with Groq’s chips you’d need to buy 25 of them at $20k each. To run something like the 70B model you need at least $6M in hardware.
Things don't have to be exactly in the same category to usefully compare them. Groq has emerged as the premiere way to host LLMs at scale at speeds that are dramatically faster than anyone else, which allows for all kinds of new and exciting applications that wouldn't work if you had to wait for 50tok/sec responses, but which feel magical at 500tok/sec. And although the exciting with LLMs seems to be around the training of them, I think if you look out a few years, vastly more FLOPs will be expended on inference than on training.
I don't think that's really a fair comparison. Like if you wanted to sell ball bearings you could spend $50 and start dripping lead into a bucket of water, or you could spend millions on a factory. The latter is definitely the better business model. What maters is quality of result (latency and tokens/s/user) and throughput per watt or capex dollar, not your minimum capex expense so long as you're running at any kind of scale.
How is throughput per watt affordable? Each card has a TDP of 275 watts, and a maximum draw of 375. If you try to run a big model like Llama-3 400B you're looking at a cluster with 1,8k cards, and a total power draw of nearly 500KWs per hour.
My question is: If what Grog is doing is so good, why can't Nvidia copy it?
As far as I know, Grog AI accelerators are really fast because they load the model into SRAM only and spread the model out over many Grog chips. They don't use HBM or off-chip RAM at all.
That doesn't seem like a defensible moat to me.
Right now, I believe Nvidia's GPUs are geared more towards training. These startups tend to compete more in inference since it's simpler. I do think Nvidia needs to shore up their inference offerings more. Actually, buying Grog or starting their own inference-only chips aren't bad ideas.
It takes a while to design, manufacture, and test/validate new silicon. There is also a very elaborate software stack on top of it. This is a multi-year effort to reproduce for an extremely talented, nimble engineering team. And by then, Groq will have even more refined and powerful systems available.
Also, when an end market is growing as fast as the AI compute market is, multiple players can do very well for years at a time. I think Groq also has a high probability of being acquired by Google/Microsoft/Apple/Meta if the FTC allows it. But they are probably better off following the hockey stick growth for a couple more years before doing that.
Nvidia should have a far more well oiled machine in designing/validating new silicon. Software stack? Nvidia already has that. All models are optimized for CUDA - not Grog.
At a very high level, Nvidia just needs to break out the Tensor cores into its own chips, add loads of SRAM, and run CUDA. They already have all the other datacenter stuff figured out such as interconnects.
Is there something special about this unlike the million other AI companies that justifies putting some company’s press release on the front page or was this the result of unnatural voting patterns?
AFAIK they are a hardware company producing special chips for processing AI tasks. Their response time is incredibly fast, like 500t-900t/s for LLMs. They're not just a company that buys in Nvidia cards for serving as a backend.
In other words, they are competing with Nvidia, not the million startups using Nvidia.
I don't understand why all of their hubub is about tokens per second (which relies on a lot more than just hardware) instead of FLOPs/memory bandwidth/etc.
They do make their own hardware, but of note is that they stopped selling hardware and instead pivoted to only selling API access, 4 months ago. Hence no need to describe anything aside from tokens/sec?
It is incredibly confusing, Groq was first (by a lot) and they did write a nice letter (https://wow.groq.com/hey-elon-its-time-to-cease-de-grok/), but understandably they probably don't want to burn money suing the richest person in the world.