Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c

tomalaci · 2024-07-11T19:36:58 1720726618

With how much NVidia is developing AI-workload accelerating hardware, I expect this will cost maybe few dozen dollars and train in few hours within next few years.

What I think will be interesting is when commodity hardware can run cheap inference from very capable, specialized models. Pretty sure it will spawn a new golden age of AI-powered desktop applications.

For example, video game space has already been trying to create AI-powered NPCs, world generation and story-telling (e.g. Inworld AI).

swatcoder · 2024-07-11T19:57:33 1720727853

> For example, video game space has already been trying to create AI-powered NPCs, world generation and story-telling (e.g. Inworld AI).

This'll be a niche for a long, long time.

Games are generally carefully crafted to deliver a specific mechanical and/or narrative experience. A world populated by LLM/etc bots or content is one choice of what that experience might be, but it's not going to be a very satisfying one for many game designers -- especially given the current/near state of the technology. There will be games and experiments that explore it, for sure, but the vast majority of games just don't have any need for it.

123yawaworht456 · 2024-07-11T20:45:29 1720730729

for narrative/dialogue, yes, text generation is currently useless. censorship, slop, extreme positivity bias. even jailbroken Opus is shit.

but audio generation we already have is pretty much good enough, and this is big. it's not AAA tier yet, sure, but still lightyears better half-assing it with mediocre voice actors. it is now an option to only use real voice actors for a few key characters, and even that won't be necessary within the next decade. even indie video games will be fully-voiced soon.

glial · 2024-07-11T22:24:27 1720736667

Similar for textures - it would be really neat if textures were auto-generated to add detail when you get close to something. As it is, the sprites just look bad.

miki123211 · 2024-07-12T02:09:46 1720750186

Same for dubbing.

Voicing a game in English is doable, voicing a game in n languages is only doable if you're the size of EA / Activision / Ubisoft.

jetrink · 2024-07-11T20:37:01 1720730221

Those don't have to be mutually exclusive though. To take AI out of it, think of those murder mystery parties where actors interact with attendees. Actors have roles to play and things they must do to move the story forward, but they improvise their dialog when talking with the players and sometimes each other. Or if you've ever played D&D, you have experienced talking with NPCs that are controlled by your DM. I think video game AI could be a lot like that, where NPCs use natural language instead of rigid dialog trees, but otherwise, they behave a lot like they do today.

swatcoder · 2024-07-11T20:47:20 1720730840

Yes, that's the intuition for where the technology might go. Someday.

But actors and DM's are much more disciplined than LLM's, partly because they have careers and friendships on the line for misbehavior. For what amazing things they can do in good weather, LLM's are not really reliable when you want them to consistently deliver something very specific, very secure, or very artfully crafted. They may get there, but their design makes it a very hard problem that we're still a long way from seeing commercialized.

jononor · 2024-07-11T23:16:40 1720739800

Agreed that "live" in-game LLMs for NPCs is probably a little while away. Need better tools than now that allows to strike a balance between constrained/directed output and variation. I suspect that LLMs are already used by (some) game developers to aided dialogue creation tools for game developers, where they curate and select more or less a fixed vocabulary. I do think that there are many viable steps to "generic". For example, if the game developer could specify possible dialog trees (as in the possible branches and outcomes), then an LLM could ideally fill in the concrete text when reaching the different paths. Or an LLM could add in "small talk" - meaning things that reflect the world state, recent events, but still have the quest type dialog be practically hardcoded (so there is no risk of unreachable states etc).

thatguymike · 2024-07-11T20:41:04 1720730464

Ah, like this: https://www.youtube.com/watch?v=Kw51fkRiKZU

headcanon · 2024-07-11T20:15:40 1720728940

I do think there is a big opportunity for widely supported hardware-accelerated matrix algebra in games. Currently most of that is geared towards graphics (naturally) but being able to easily encode arbitrary models and have them run on-device would open up a lot of opportunities for games (like deep simulation) that weren't possible before. Its currently possible of course but requires custom tooling and (relatively) niche hardware like a high-end graphics card.

I see the development energy around LLMs as a way to open up support for that.

opyate · 2024-07-12T07:44:59 1720770299

Perhaps Llama are good enough now to add flavour NPCs to the world, i.e. NPCs not on your critical path, but maybe on the town square where you can hear them talk about mundane life stuff.

ImHereToVote · 2024-07-11T20:03:03 1720728183

I don't have needs. I have wants. You see, I don't need. I want.

up2isomorphism · 2024-07-11T19:51:57 1720727517

> For example, video game space has already been trying to create AI-powered NPCs, world generation and story-telling (e.g. Inworld AI).

To me this is a downside compared to the NPC generated by humans, since that’s the only reason I would like to read them.

bick_nyers · 2024-07-11T19:59:14 1720727954

What if the LLM that a specific NPC utilizes was handcrafted/fine-tuned by a human?

MisterBastahrd · 2024-07-11T19:57:32 1720727852

You don't even necessarily need to have them coming up with valid speech. Simply giving out random quests and rewards would keep people running on a loot treadmill for most open world multiplayer games.

LeanderK · 2024-07-11T20:00:41 1720728041

I think at first even background NPCs that don't give quests and rewards would be nice. Sort of give everyone a character and let them just babble. It breaks the immersion to hear the same phrases repeatedly. You can still handcraft every important quest and interaction to achieve high fidelity, but I would like random NPCs you bump into to not just repeat things all the time.

swatcoder · 2024-07-11T20:17:44 1720729064

You're more likely to see studios (who want lots of more varied content) use generative AI in the studio, where they might generate and review it before release.

Letting generators run free on the client sets up different kinds of immersion-breaking, where NPC's hallucinate misleading details about the story/world, can be tricked into reciting off-topic absurdities or age-rating violations, etc. AAA studios can't afford the embarrassment of that and smaller designers with pride of craft won't see their signature come through the art in it. Surely, some designers will figure out ways to make it work great for some specific idea, but it's not the best way to use the technology in most cases.

fragmede · 2024-07-16T18:24:26 1721154266

Sounds like there's a ton of money to be had to someone that can solve those problems though. A version of fallout where I can go up to people/things and just start talking to them, instead of picking from a list of things to say? :shutupandtakemymoney:

swatcoder · 2024-07-11T20:05:08 1720728308

To make this work, you need your LLM-based AI to outperform any other form of generating quests and rewards -- and that performance is measured on things like player enjoyment, game/story progression, exploitability, client system requirements or server operating costs, etc and most of those things are very hard to constrain or optimize for with an LLM right now.

While the costs are hidden from end users and are going down quickly, good LLM's remain very expensive to run and very hard to keep on track compared to other options.

MisterBastahrd · 2024-07-12T17:01:21 1720803681

You're trying to big brain something that isn't that complicated or special.

I never said that every interaction with an NPC must necessarily require AI to create a new quest and reward from scratch. Players aren't that savvy. You just generate a pool of quests and assign them to NPCs in the instance. Players who are addicted to loot treadmills will stay in the game and pay for boosts and other rewards as long as there is content to engage them. This more than pays for the AI service.

techjamie · 2024-07-11T20:31:53 1720729913

You could maybe pull this off in a game like Borderlands where the loot is basically just the same dozen guns but with different numbers and effects. But as is, the LLM text isn't going to be much different than a sufficiently large AdLib system.

I think there is value to be gained in having LLMs as part of the development process, maybe even the game itself, but I think conventional methods are about as sufficient for quests.

chongli · 2024-07-11T20:14:24 1720728864

Yeah. I have played a bunch of the roguelike Caves of Qud [1] and it has both hand-written text and procedurally generated text. The former is quite interesting and relevant to both gameplay and plot. The latter is mostly uninteresting and irrelevant, though it does work as "filler." This is similar to how procedurally-generated grass can give a more natural look to a hill than you'd get with tiles (which are incredibly easy to spot unless a ton of work is put into hiding the seams and repeating patterns).

I still long for the day when we can have procedurally-generated stories and quests that are actually interesting to play through. I have no idea how that is going to work though!

[1] https://www.cavesofqud.com

FooBarBizBazz · 2024-07-12T13:35:49 1720791349

> I still long for the day when we can have procedurally-generated stories and quests that are actually interesting to play through.

It's an interesting artistic and technical challenge to investigate, absolutely. I hope people work on it, and I'm sure they already are.

However, let me also offer a counterpoint with something much simpler: coffee.

It is possible to fully automate the process of making an espresso drink. You can buy cheap versions of these machines that will sit on your kitchen counter; the quality of the drinks so produced is not the highest, but I suspect it's entirely possible to build a machine that would actually make high-quality drinks, perhaps with even more precision than a human can.

Yet the opposite trend has prevailed over the last few decades. Not long ago, Americans drank drip coffee (hence the approximation thereof as the Americano). This is an easy process to make in large batches with minimal labor. It is almost fully automatic, without any "automation". And certainly this still exists all over the place. But we now also have coffee shops everywhere -- witness the explosion of Starbucks out of its home in Seattle -- where people, baristas, individually make espresso drinks for customers. This is immensely popular.

Why are tons of people now employed in this way, and why hasn't it been automated? Is it really just a matter of cost?

I don't think so.

I think the knowledge that another person took the trouble to do something for you is in fact part of the product. Consider the practice of drawing designs, like leaves or hearts, in the milk foam. This is entirely unnecessary, from a taste perspective. If you put a lid on it, as is commonly done, you will not even see it. But it carries a message -- that somebody gave enough of a damn to do it.

(We could then analyze how this phenomenon gets watered down and eventually destroys itself when it tries to turn itself into a mass-produced fast-food franchise staffed by underpaid/exploited proletarians, but that would take this post in another direction.)

My point is, I think something similar happens in video games. The very knowledge that another person was involved is important. You are receiving communication from this person. It does something to synchronize, partially, your mind with theirs. And this is a thing we appreciate. If there's no person on the other end, why should we care?

This general phenomenon of paradoxiciality can be a bad thing. It can "feel like" socializing, without producing the actual, real, thick social networks that give us rich lives. There can be an exploitative and even druglike dynamic. It isn't entirely a good class of phenomena.

But in small doses I think it's good, useful, important. I think it's a core part of what makes art valuable. And of course, it applies to games.

This is why, I believe, we will not fully automate the telling of stories, or the making of lattes. The person doing it is important.

chongli · 2024-07-12T15:14:44 1720797284

I think the knowledge that another person took the trouble to do something for you is in fact part of the product.

It's an interesting thesis but I don't for a second believe Starbucks sells billions of coffees because people want the barista experience. Starbucks would absolutely automate their business with fancy machines if they could. The problem is that the drinks are so complicated and customizable that no one has built a machine capable of making them all.

Plus there are plenty of people who order Starbucks drinks through an app and never actually meet the barista who made them. If those drinks were made at some fully automated commissary and delivered by drone they'd be just as happy.

The very knowledge that another person was involved is important. You are receiving communication from this person.

That's important for some people, and some games, but not for everyone. I play roguelikes mainly for the challenge. The procedurally generators in these games can create bizarre and very challenging situations no human could ever come up with.

If there's no person on the other end, why should we care?

Because it's a puzzle for your mind to figure out. It's why people play solitaire games (with a deck of cards), random sudokus, tetris, etc.

FooBarBizBazz · 2024-07-13T12:54:53 1720875293

> Plus there are plenty of people who order Starbucks drinks through an app and never actually meet the barista who made them.

Yeah, that's a really strong counterargument.

FooBarBizBazz · 2024-07-13T12:53:04 1720875184

> This general phenomenon of paradoxiciality

That was supposed to be "parasociality".

robbomacrae · 2024-07-11T20:14:06 1720728846

Whilst I agree with the reservations of the other replies I think you were implying in the future and I'm sure the LLM's will be more trustworthy and up to the task at some point.

What I would really like to see now is all the new TTS models being used more widespread. There are still so many games that have text only output. My kid love Alba: A wildlife Adventure but the eldest still isn't quite ready to read all the text so I have to sit with them reading out all the lines.

If anyone has a way of applying universal mods / accessibility features to existing games I'd love to see someone solve this and happy to help with the TTS!

talldayo · 2024-07-11T19:49:37 1720727377

> I expect this will cost maybe few dozen dollars and train in few hours within next few years.

I wouldn't count on it. Nvidia's been cleaning up shop, but their best option for expanding right now is through parallelization (bigger clusters, basically). Now that Blackwell is on TSMC, Nvidia is alongside Apple waiting for new and denser nodes to upgrade to. A real "generational leap" in training cost is going to require some form of efficiency gain that we're not seeing right now. It's possible that Nvidia has something up their sleeve, but I'm not holding my breath.

> What I think will be interesting is when commodity hardware can run cheap inference from very capable, specialized models.

What's funny is, you basically already can. The problem is becoming integration, and in the case of video games, giving the AI a meaningful role to fill. With today's finest technology, you can enjoy an AI-generated roguelike that is nigh-incomprehensible: https://store.steampowered.com/app/1889620/AI_Roguelite/

As time goes on, I really think developers are just going to not use AI for video games. Maybe I'm missing the "minecraft moment" for procedurally-generated stories here, but the sort of constraints needed to tell a story of create an interactive experience don't exist within LLMs. It's a stochastic nightmare of potential softlocks, contradictions or outright offensive requests. The majority of places I've seen AI applied today isn't for content creation, but instead automated moderation.

forrestthewoods · 2024-07-11T19:52:11 1720727531

> For example, video game space has already been trying to create AI-powered NPCs, world generation and story-telling (e.g. Inworld AI).

Current AI isn't even close to good enough for video game NPCs and related. We're several breakthroughs away from that being possible at any cost. Those breakthroughs might happen in 3 years, or they might not happen in 10. Hard to predict.

Tiberium · 2024-07-11T19:53:44 1720727624

Are you sure? Models like Claude 3.5 Sonnet are both good at writing and instructions, as long as you set some guardrails for the model, they can be great NPCs.

forrestthewoods · 2024-07-11T20:26:37 1720729597

> Are you sure?

Absolutely.

Current LLMs have insufficient world state. Imagine a game like Stardew Valley. It's got a town with 30 NPCs or some such. They all have personalities and the player builds a relationship with them over time. Current LLMs can't do that. They hallucinate waaaaaaay too much. You can't reliably define and evolve relationships. Amongst many other short comings.

I'm super pro AI and use ChatGPT all the time for programming. So I'm not being an AI hater. But I am a gamedev and I can say that what exists today simply isn't good enough.

Tiberium · 2024-07-11T20:55:59 1720731359

But you're saying that you want a single model to handle all NPCs and the whole world. Of course this isn't possible currently. But using a separate model with separate context for each character is. Also, if you use ChatGPT for programming, try Claude 3.5 Sonnet - it's really better than GPT-4o for programming.

forrestthewoods · 2024-07-11T21:23:08 1720732988

> But you're saying that you want a single model to handle all NPCs and the whole world.

No, I did not say that at all. I didn't specify how the LLMs may or may not be structure. I'm saying that current LLMs - and yes I've used Claude 3.5 Sonnet - are insufficient. There is no existence proof that they're sufficient.

LLMs are great. They aren't great enough for video game NPC. Not yet. Further innovation is needed. You're free to disagree. I can't prove a dispositive. But there is no working example.

ebalit · 2024-07-12T00:55:24 1720745724

As a gamedev, what would you use currently instead for NPCs? And what would you say makes this/those solutions better than LLMs?

forrestthewoods · 2024-07-12T02:07:33 1720750053

Dialogue systems vary by game and are generally custom. But they amount to hard coded if-else branches if you squint. The only one I ever wrote was a simple announcer for a sports-ish shooter.

Your question is weird. Every video game ever made has shipped using not LLMs. Not a single commercial game has ever shipped with an LLM. So I’d say what makes classic NPC systems better is they’ve shipped tens of thousands of commercial successes over 50+ years. And what makes LLMs worse is they haven’t once been proven viable for even a single title. Nor have they produced even a compelling tech demo.

Geez people.

ebalit · 2024-07-16T18:21:30 1721154090

My question was genuine as I'm not from the gamedev domain and I might have missed the real state of the art.

Hard coded dialogs often feel very unnatural and limiting. I can see why people want to explore LLM to try to make new experiences possible.

I can see it becoming a new dimension of game design, open vs closed dialogs, like there is currently open vs closed world. And as in the open vs closed world, they will probably coexist instead of one type replacing the other.

Tiberium · 2024-07-12T13:41:28 1720791688

There are multiple games that use LLMs, available in Steam.

forrestthewoods · 2024-07-12T15:35:04 1720798504

Name them and demonstrate they don’t suck.

Tiberium · 2024-07-12T23:04:21 1720825461

First you were talking about those games just existing, now you want me to prove to you that "they don't suck". That doesn't seem fair. Anyway, here you go:

https://store.steampowered.com/app/1519310

https://store.steampowered.com/app/1889620

Those are the two that I could easily remember, there are much more games (already in the 10s) that embed the LLM APIs for things like dialogue generation for some parts of the game, e.g. https://store.steampowered.com/app/2530950/

charlescurt123 · 2024-07-11T20:17:11 1720729031

I imagine we could do this now but not the way you think.

have a human created story and text as a guideline.

With that have genAI make the text per stage, you would get different statements every time and would stay on track.

Would be interesting to play a game where all players say the same information in slightly different ways every single playthrough.

HPsquared · 2024-07-11T20:47:19 1720730839

Similar scaling to genome sequencing. First genome was a huge undertaking, now routine after a few Moore-esque cycles.

alecco · 2024-07-11T19:22:15 1720725735

Also https://x.com/karpathy/status/1811467135279104217#m

iforiq · 2024-07-11T19:30:14 1720726214

How much did gpt2 training cost when it came out in 2019?

ozr · 2024-07-11T19:36:25 1720726585

About $50,000.

withinboredom · 2024-07-11T19:37:51 1720726671

They probably spent more on the training data, to be honest. They had to get it the hard way.

alecco · 2024-07-11T19:52:55 1720727575

It will be interesting to see this with today's FlashAttention 3 for H100.

rurban · 2024-07-11T20:11:19 1720728679

Would be free for us because we have those H100`s, but currently it's way too hot now. They will reach 70°C, even watercooled.

jamestimmins · 2024-07-11T19:35:33 1720726533

Anyone have an idea if this is feasible to do on a Macbook with a built-in GPU?

michaelmior · 2024-07-11T19:37:58 1720726678

Probably not with the same amount of training time, but I'd imagine a recent MBP GPU could handle GPT-2 training. The biggest challenge is that the training would need to be reimplemented for Metal instead of CUDA.

rty32 · 2024-07-11T20:24:58 1720729498

Slightly off topic -- I just saw people saying how Mac's unified memory makes it a strength to train models on Macs: https://www.macrumors.com/2024/07/10/apple-leads-global-pc-g..., and how energy efficient they are etc. But what I am seeing is that people don't often even touch Macs at all -- they write code with CUDA and that's it. I find this kind of conversation fascinating.

jamestimmins · 2024-07-11T19:53:20 1720727600

Ah so I couldn't just run this on my laptop for ~48 hours? That's too bad.

danielmarkbruce · 2024-07-11T19:59:02 1720727942

He does it on 8 H100's in 24 hours, ie 192 H100 hours. It's going to be thousands of laptop hours.

mmoskal · 2024-07-11T20:03:41 1720728221

H100 SXM is 2000 TFLOPS at FP16. Multiply by 8.

M3 Max is 28 TFLOPS at FP16.

Based on FLOPS alone, it would be more like a year or two.

Davidzheng · 2024-07-11T20:05:58 1720728358

Can you estimate how long it would take to replicate alphago zero today on one set of 8xH100.

karpathy · 2024-07-11T20:17:40 1720729060

(H100 SXM is 1000 TFLOPS, *2 is from "with sparsity", which is not used here.)

mmoskal · 2024-07-11T20:35:00 1720730100

Right... and there are probably some communication overheads over NVLink that would not be present on single laptop. So a few months maybe :)

latchkey · 2024-07-12T01:49:44 1720748984

MI300x is 1300 TFLOPs at FP16 (without sparsity). Looking forward to seeing the results.

https://www.amd.com/en/products/accelerators/instinct/mi300/...

arthurcolle · 2024-07-11T19:36:23 1720726583

Will take more hours