Groq Raises $640M to Meet Soaring Demand for Fast AI Inference

eigenvalue · on Aug 5, 2024

I have to say that I would much rather invest in Groq at a $2.8b valuation than NVDA at a $2.55T valuation. Hard to imagine a world where those two numbers are both correct at the same time.

daghamm · on Aug 5, 2024

How are these two comparable?

Nvidia has tons of other products, and they have been delivering AI accelators for many years now. Groq on the other hand is basically a gamble at this point.

erichocean · on Aug 5, 2024

Working hardware is a gamble?

As a developer, I'm thrilled they'll be increasing capacity.

uncivilized · on Aug 5, 2024

One company has a history of generating revenue, the other doesn't.

uncivilized · on Aug 5, 2024

In literature on the financial markets, authors routinely discuss how retail will decide to invest in companies in less time than it takes them to order lunch, or in this case write a comment on a forum.

This comment is the epitome of that considering how Groq has near zero revenue while Nvidia brings in close to $100b annually with over 50% profit margins.

This is how both numbers can be correct at the same time.

eigenvalue · on Aug 5, 2024

Yeah, except I worked for a decade doing investment management at large hedge funds. And I'm also very familiar with the product offerings of both companies and think Groq is on an extreme growth trajectory. NVDA is extremely overvalued right now based on optimism around AI compute. If that optimism is warranted, then Groq is likely to 10x or more from here in the next few years.

aurareturn · on Aug 6, 2024

Or Nvidia builds an inference-only chip, uses its already well-oiled CUDA to support all models out of the box, and destroys Grog's market?

Just because Grog is starting out smaller, doesn't mean they will actually survive to 10x their current valuation.

2.8 * 10 = 28 billion. Very few companies ever reach that point.

elorant · on Aug 5, 2024

There’s no comparison between those two though. You can go and get an RTX 3060 with $300 and run Llama3 8B. To do the same with Groq’s chips you’d need to buy 25 of them at $20k each. To run something like the 70B model you need at least $6M in hardware.

eigenvalue · on Aug 5, 2024

Things don't have to be exactly in the same category to usefully compare them. Groq has emerged as the premiere way to host LLMs at scale at speeds that are dramatically faster than anyone else, which allows for all kinds of new and exciting applications that wouldn't work if you had to wait for 50tok/sec responses, but which feel magical at 500tok/sec. And although the exciting with LLMs seems to be around the training of them, I think if you look out a few years, vastly more FLOPs will be expended on inference than on training.

lostmsu · on Aug 6, 2024

Where can I find technical performance comparison?

What models does Groq run at 500tok/sec? In what mode (e.g. batch size)?

UPD. found the model, it is Mixtral 8x7B-32k. AFAIK 8xH100 will do 100+tok/sec with batch size 1.

But that does not look too impressive for batch sizes higher than 1: https://www.baseten.co/blog/faster-mixtral-inference-with-te...

recursivecaveat · on Aug 5, 2024

I don't think that's really a fair comparison. Like if you wanted to sell ball bearings you could spend $50 and start dripping lead into a bucket of water, or you could spend millions on a factory. The latter is definitely the better business model. What maters is quality of result (latency and tokens/s/user) and throughput per watt or capex dollar, not your minimum capex expense so long as you're running at any kind of scale.

elorant · on Aug 5, 2024

How is throughput per watt affordable? Each card has a TDP of 275 watts, and a maximum draw of 375. If you try to run a big model like Llama-3 400B you're looking at a cluster with 1,8k cards, and a total power draw of nearly 500KWs per hour.

aurareturn · on Aug 5, 2024

My question is: If what Grog is doing is so good, why can't Nvidia copy it?

As far as I know, Grog AI accelerators are really fast because they load the model into SRAM only and spread the model out over many Grog chips. They don't use HBM or off-chip RAM at all.

That doesn't seem like a defensible moat to me.

Right now, I believe Nvidia's GPUs are geared more towards training. These startups tend to compete more in inference since it's simpler. I do think Nvidia needs to shore up their inference offerings more. Actually, buying Grog or starting their own inference-only chips aren't bad ideas.

eigenvalue · on Aug 5, 2024

It takes a while to design, manufacture, and test/validate new silicon. There is also a very elaborate software stack on top of it. This is a multi-year effort to reproduce for an extremely talented, nimble engineering team. And by then, Groq will have even more refined and powerful systems available.

Also, when an end market is growing as fast as the AI compute market is, multiple players can do very well for years at a time. I think Groq also has a high probability of being acquired by Google/Microsoft/Apple/Meta if the FTC allows it. But they are probably better off following the hockey stick growth for a couple more years before doing that.

aurareturn · on Aug 6, 2024

Nvidia should have a far more well oiled machine in designing/validating new silicon. Software stack? Nvidia already has that. All models are optimized for CUDA - not Grog.

At a very high level, Nvidia just needs to break out the Tensor cores into its own chips, add loads of SRAM, and run CUDA. They already have all the other datacenter stuff figured out such as interconnects.

consumer451 · on Aug 5, 2024

For additional context, here is the popular post for when they first gave a public demo of their API:

https://news.ycombinator.com/item?id=39428880

mdhb · on Aug 5, 2024

Is there something special about this unlike the million other AI companies that justifies putting some company’s press release on the front page or was this the result of unnatural voting patterns?

mrtksn · on Aug 5, 2024

AFAIK they are a hardware company producing special chips for processing AI tasks. Their response time is incredibly fast, like 500t-900t/s for LLMs. They're not just a company that buys in Nvidia cards for serving as a backend.

In other words, they are competing with Nvidia, not the million startups using Nvidia.

zaptrem · on Aug 5, 2024

I don't understand why all of their hubub is about tokens per second (which relies on a lot more than just hardware) instead of FLOPs/memory bandwidth/etc.

consumer451 · on Aug 5, 2024

They do make their own hardware, but of note is that they stopped selling hardware and instead pivoted to only selling API access, 4 months ago. Hence no need to describe anything aside from tokens/sec?

https://news.ycombinator.com/item?id=39964590

elorant · on Aug 5, 2024

They provide those numbers too. Memory bandwidth is at whooping 80 TBs/second.

mrtksn · on Aug 5, 2024

Maybe because that's how its measured in the LLM community? It's all about tokens per second.

jjice · on Aug 5, 2024

$640M of fundraising seems notable enough for this level of HN attention.

sva_ · on Aug 5, 2024

It's a hardware company and justified to be on HN imo. They're doing interesting stuff.

htrp · on Aug 5, 2024

Dupe https://news.ycombinator.com/item?id=41162463

pryelluw · on Aug 5, 2024

No pricing page?

sronors · on Aug 5, 2024

Gated by login: https://console.groq.com/settings/billing

bhouston · on Aug 5, 2024

Not Elon Musk’s grok? Because this one has a q at the end? So confusing. Someone should trademark in force the other.

burkaman · on Aug 5, 2024

It is incredibly confusing, Groq was first (by a lot) and they did write a nice letter (https://wow.groq.com/hey-elon-its-time-to-cease-de-grok/), but understandably they probably don't want to burn money suing the richest person in the world.

TiredOfLife · on Aug 5, 2024

Grok was first by about 50-60 years.

trenchgun · on Aug 5, 2024

Groq actually has a competetive edge about something, unlike Musk's "Grok"

TiredOfLife · on Aug 5, 2024

Grok is a word made up by Heinlein about 10 years before Musk was born.

swalsh · on Aug 5, 2024

Yeah, this one needs to be resolved. It really is confusing since they're in the same space.