GAN Theft Auto [video]

ShamelessC · on June 22, 2021

Great work! Hacker News still seems to have a deeply skeptical culture with regard to machine learning - not sure why. There's always someone saying it's "not novel" and it's "just doing x".

Overfitting is a known issues in machine learning, people. If you still think all neural networks are doing is memorizing the dataset completely in the year 2021 - you might want to revisit the topic. It is one of the first concerns anyone training a deep model will have and to assume this model is overfit _without_ providing specific examples is arguing in bad faith.

Sentdex has shown his GAN is able to generalize various game logic like collision/friction with vehicles and also learns aspects of rendering such as a proper reflection of the sun on the back of the car.

He also showed weak points where the model is incapable of handling some situations and even did the impossible task of "splitting a car in two" to try and solve a head-on collision. Even though this is a failure case; it should at least provide you with some intuition that the GAN isn't just spitting out frames memorized from the dataset because that never happens in the dataset.

You will need to apply a little more rigor before outright dismissing these weights as merely overfit.

@sentdex Have you considered a guided diffusion approach now that that's all the rage? It's all rather new still but I believe it could be applied to these concepts as well.

sentdex · on June 22, 2021

Heh, yeah, tough crowd I guess. The full code, models, and videos are all released and people are still skeptical.

I feel like 95%+ of papers don't do anything besides tell you what happened and you're just supposed to believe them. Drives me nuts. Not sure why all the hate when you could just see for yourself. I'd welcome someone who can actually prove the model just "memorized" every combo possible and didn't do any generalization. I imagine the original GameGAN researchers from NVIDIA would be interested too.

Interesting @ guided diffusion, not aware of its existence til now. We've had our heads down for a while. Will look into it, thanks!

godelski · on June 22, 2021

> I feel like 95%+ of papers don't do anything besides tell you what happened and you're just supposed to believe them.

Honestly I think there's a big problem with page limits. My team recently had a pre-print that was well over 10 pages and we still didn't get everything and then when we submitted to NeurlIPS we had to reduce it to 9! This seems to be a common problem and why you should often check different versions on ArXiv. And we had more experiments and data we needed to convey since the pre-print. This problem is growing as we have to compare more things and tables can easily take up a single page. I think this causes an exaggeration of the problem that always exists of not explaining things in detail and expecting readers to be experts. Luckily most people share source code which helps show all the tricks authors used and blogging is becoming more common which further helps.

> I'd welcome someone who can actually prove the model just "memorized" every combo possible

Honestly this would be impressive in of itself.

danuker · on June 22, 2021

There's the Hutter Prize [1] - memorizing is useful (and arguably intelligent) if it's compressed.

http://prize.hutter1.net/

justinjlynn · on June 22, 2021

Indeed. Novel, efficient program synthesis is still novel, efficient program synthesis even if it's a novel, efficient data compression codec you're synthesising.

YeGoblynQueenne · on June 22, 2021

>> The full code, models, and videos are all released and people are still skeptical.

If you're uncomfortable with criticism of your work you should definitely try publishing it, e.g. at a conference or journal. It will help you get comfortable with being criticised very quickly.

alimov · on June 22, 2021

I think he’s pointing out that the “criticism” here is similar to that of a person criticizing a book they’ve never read or even flipped through.

YeGoblynQueenne · on June 23, 2021

Perhaps, but that criticism should be the easiest to ignore. The OP expresses frustration to lay criticism and I expect that even brief contact with academic criticism will make the frustration felt by the OP to lay criticism fade into irrelevance.

ShamelessC · on July 2, 2021

I've been learning about this stuff for about a year now. Your earlier experiments with learning to drive in GTA V were an inspiration for me - because they hit that perfect intersection of machine learning, accessibility in education, and just plain cool.

The pandemic hit and Open AI had released DALL-E and CLIP. I was unemployed and bored with my Python skills and decided to just dive in. I found a nice gentleman named Phil Wang on github had been replicating the DALL-E effort and decided to start contributing!

You can find that work here

https://github.com/lucidrains/DALLE-pytorch

and you'll find me here:

https://github.com/afiaka87

We have a few checkpoints available with colab notebooks ready and there is also a research team with access to some more compute who will eventually be able to perform a full replication study and match a similar scale to Open AI and then some because we are also working with another brilliant German team https://github.com/CompVis/ who has provided us with what they are calling a "VQGAN" (if you're not familiar) - which is a variational autoencoder for vision tokens with the neat trick from GAN-land of using a discriminator in order to produce fine details.

https://github.com/CompVis/taming-transformers

We use their pretrained VQGAN to convert an image into digits. We use another pretrained text tokenizer to convert words to digits. The digits both go into a Transformer architecture and a mask is applied to the image tokens in the transformer so that the text tokens can't see the image tokens. The digits come out and we encode them back into text and image respectively. Then, a perceptual loss is computed. Rinse, wash, repeat. Slowly but surely, text predicts image without ever having been able to actually _see_ the image. Insanity.

Anyway, taking a caption and making a neural network output an image from it has again hit that "perfect intersection of machine learning, accessibility in education, and just plain cool". I don't know if you could fit it into the format of your YouTube channel but perhaps it would be a good match?

codetrotter · on June 22, 2021

FWIW I saw your video a couple of days ago via Reddit and I loved it a lot. Even sent a link to the video to a friend of mine because I think it was a very inspiring and interesting video.

I hope you don't let naysayers get to you :)

fossuser · on June 22, 2021

This is wild - thanks for putting the video together, it’s very cool.

rasz · on June 22, 2021

One of the main problems with ML/NN is it often works like magic, aka the trick works as long as audience doesnt know the secret behind it. Its fascinating to gullible audience, mundane bordering on boring to practitioners.

My Tiger repelling rock^^^^^^leopard detection model works great on all animal pictures ... until you feed it a sofa https://web.archive.org/web/20150703094328/http://rocknrolln...

>able to generalize various game logic like collision/friction with vehicles and also learns aspects of rendering such as a proper reflection of the sun on the back of the car

id did none of that, what this model did is learn all the frames of video and their chronological order according to the input.

> impossible task of "splitting a car in two" to try and solve a head-on collision.

it played back both learned versions at once, like reporting confidence of round thing being 50% ball and 50% orange.

sentdex · on June 22, 2021

In the end, everything is boiling down to matrix math, so you can always make the argument that no neural network is impressive if you want.

The model's size is ~173MB, depending on settings. That's not much space to have memorized every single possible combination of events, nor was our data enough to cover that either.

fartcannon · on June 22, 2021

Your original self driving GTA5 videos are what helped me come to understand machine learning in the first place (along with some of Seth Bling's MarI/O, and a bit of Tom7's learn/play-fun magic). I used your tech to make an AI that played Donkey Kong Country in LSNES emulator shortly before Gym-Retro was released.

So, thanks a bunch, Sentdex. You are rad.

sentdex · on June 22, 2021

Hah, awesome! Any plans to apply GAN Theft Auto to something else? :o

fartcannon · on June 22, 2021

Not offhand, but you've probably inspired a lot of creativity with this across the internet... and a lot of copy cats. I'm looking forward to seeing what gets made.

YeGoblynQueenne · on June 22, 2021

>> The model's size is ~173MB, depending on settings. That's not much space to have memorized every single possible combination of events, nor was our data enough to cover that either.

The resolution of the images output by the model is very low (what is it exactly, btw?). It's not impossible that your model has memorised at least a large part of its data.

In fact the simplest explanation of your model's output (as of much of deep neural networks for machine vision) is that it's a combination of memorisation and interpolation. There was a recent ish paper by Pedro Domingos that proposed an explation of deep learning as memorisation of exemplars similar to support vectors (if I understood it correctly - only gave it a high-level read).

It's also difficult to see from your demonstration exactly what the relation between the output and the input images are. You're showing some very simple situations in the video (go left, go right) but is that all that was in the input?

For example, I'd like to see what happens when you try to drive the car over the barrier. Was that situation in the input? And if so, how is it modelled in the output?

Finally, how do you see this having real-world applications? I don't mean necessarily right now, but let's say in 30 years time. So far, you need a fully working game engine to model a tiny part of an entire game in very low resolution and very poor detail. Do you see this as somehow being extended to creating a whole novel game from scratch? If so, how?

Edit: on memorisation, it's not necessary to memorise events, only the differences between sets of pixels in different frames. For instance, most of the background and the road stays the same during most of the "game". Again, the resolution is so low that it's not unfathomable that the model has memorised the background and the small changes to it necessary to model the input. So, it interpolates, but can it extrapolate to unseen situations that are nevertheless predicted by the physics you suggest it has learned, like driving over the barrier?

haecceity · on June 22, 2021

Video frame resolution is pretty small...

teruakohatu · on June 22, 2021

> The model's size is ~173MB

That is impressive! Less than twice the size of ResNet-50 weights. Surely that is within an order of magnitude of an equivalent Unity or GoDot game+models.

godelski · on June 22, 2021

> My Tiger repelling rock^^^^^^leopard detection model works great on all animal pictures ... until you feed it a sofa

I'm sorry, how is this different than normal software engineering? There's dozens of unit/integration testing memes poking fun at specifically this (which is a mostly solvable problem in ML btw, when you use out of distribution data. Give your model a 3rd end state that represents "neither").

> id did none of that, what this model did is learn all the frames of video and their chronological order according to the input.

A better explanation is that the network knows what frame to generate given the current frame (and n previous frames) and the current user input. If it was memorizing then it'd have to generate an extremely large number of scenarios (it would exponentially grow as any given frame has k possible actions from your current frame to the next frame). If Sendex can run the game for arbitrary length and take arbitrary actions then it is a far more reasonable explanation that the model is generating the frames rather than memorizing. Apply Occam's Razor.

Edit: Sentdex said the model was ~173MB, so that is not large enough to memorize the gameplay.

motohagiography · on June 22, 2021

Maybe I'm misinterpreting, but if you've ever seen a cat freak out about a cucumber (an entire video genre, apparently), ostensibly real intelligences make similar errors.

Beyond rote memorization, it looks like it could be explained by saying the model appears to have a found a concept of consonance and dissonance that is bounded within the field of its inputs, and a networked grammar for interacting with the up/down/left/right inputs. Some people might find that technically trivial, but as a layman I am impressed.

The "magic" part is that the response of the network appears to be so complex relative to its inputs, but given the input is so limited from a controller, it's easy to attribute more meaning to it when it is working with a finitely bounded simulated model.

Generally I'd wonder, if the behaviour appears more complex than the stimuli, do we tend to attribute intent to it?

andrepd · on June 22, 2021

> Hacker News still seems to have a deeply skeptical culture with regard to machine learning

Is... that a bad thing? Skepticism is good. When it's about something as hyped as "deep learning", even more so.

dkarras · on June 22, 2021

>Is... that a bad thing?

Yes, when it is there for no valid reason, or ridiculous reasons. Skepticism is not a default position you can take like a toddler refusing to eat their vegetables. You need some informed (and non-fallacious) intelligent reasoning behind that. "I'm skeptic about this thing using X because X is so hyped these days" is not such reasoning.

andrepd · on June 22, 2021

Well, it kind of is. Blockchain has been hyped by charlatans as the cure to all world's ills. That means when you read something about blockchain you should be especially suspicious.

Similarly, I've read too many people hyping up glorified chatbots as one step below AGI (see the :o reactions to GPT3), so I'm now extra skeptical about claims about machine learning.

akiselev · on June 22, 2021

"I'm skeptical about this thing using X to do Y because the burden of proof is on people claiming X does Y and historically they have failed to meet that burden"

I don't know what skepticism has to do with ridiculous toddlers - they are almost universally incapable of grasping the nuances of epistemology.

godelski · on June 22, 2021

> Skepticism is good

There's skepticism and then there's being a non-expert in a field and talking with high confidence. How do you differentiate these? Conspiracy theorists use the same logic. You're right that skepticism is good, but it is easy to go overboard.

ekianjo · on June 22, 2021

And then there are so called experts who are charlatans as well. Dont ever forget that possibility.

godelski · on June 22, 2021

Sure, but skepticism should decrease if there are a community of experts are saying the same thing. As an example, anti-vaxxers often claim skepticism and that they have done their own research. The reason we don't trust them is because we think doctors have a greater expertise in the subject than them (it is, either way, trusting someone). Unless you're a virologist you probably don't actually have the expertise to actually verify vaccine claims.

So sure, you are right, but in the context of this discussion you're implying that the vast majority of ML researchers (myself included) are charlatans. I'm not sure what the meaningful difference here is. We're publishing results, people are actively reproducing them, and then some person on the internet that doesn't understand the subject comes along and says "you're full of shit." We can even disprove the claims being made (e.g. I've explained why the network can't be memorizing the game in another comment). That is literally happening in this thread (GAN Theft Auto is in fact a replication/extension effort). Is that meaningfully different from the anti-vaxxers?

roystonvassey · on June 22, 2021

I think it’s a problem when it turns to - being skeptical for the sake of it.

Not been too long on HN but the top comments on most threads are a contrarian one (and one which I truly appreciate because it provides a different POv) but sadly because it is encouraged through the high upvotes, the crowd tendency is to regress towards this approach, even if sometimes the rigour of the critique is lacking

jcims · on June 22, 2021

>Skepticism is good.

It can be, but its certainly not an unmitigated good. Especially when it leads to aspersions of fraud and conspiratorial thinking (e.g. rasz's comment thread below).

bastawhiz · on June 22, 2021

Skepticism is good when it targets bold claims with vague proof. This is not a bold claim (it's a video demo showing the process) and its proof is not vague (you can inspect the source). Skepticism over something like GPT-2 without more than sample output is good. Skepticism over GPT-2 with a workable demo and source is unhelpful.

andrepd · on June 22, 2021

Funny you mention GPT-2/3, which is by all accounts a glorified chatbot, but which has nevertheless been hyped as one step below AGI by many people.

bastawhiz · on June 22, 2021

Has anyone at OpenAI made that claim?

jwilber · on June 22, 2021

I like your YouTube videos in general and think this content is a great benefit to the community.

I wouldn’t take the few negative comments personally - I’ve seen many GAN architectures that heavily overfit (including my own bobross pix2pix) get a lot of praise, while ‘less violating’ models (like yours) get more skepticism. Skepticism isn’t bad! But I’d wager in your case it may be because you’re a YouTuber, and other ml YouTubers are notorious for ripping off content (eg Siraj).

Not really related to this, but I’d personally love to see the difference in training times it would take an RL agent to adequately learn to drive a car in gta versus adequately flying a helicopter.

ALittleLight · on June 22, 2021

Because you're using the word "your" I just feel the need to clarify that I didn't create this. I just saw it on YouTube and thought it was neat.

jwilber · on June 22, 2021

Oh my bad! The dude who made it (sentdex) has replied a ton in this thread, I just assumed he was the op as well.

senkora · on June 22, 2021

Someone did a similar project with the exact same name for an ML art project at CMU a few years ago.

https://m.youtube.com/watch?v=eP5hHKne_gE&feature=youtu.be

Full list of projects: https://sites.google.com/site/artml2018/showcase/final-proje...

sentdex · on June 22, 2021

Jeez, scared me. Same name yep, totally different project. That project is pix2pix. That is not a GAN-based game engine that you play within.

godelski · on June 22, 2021

Honestly "GAN Theft Auto" is the obvious choice for the name of the project (your project).

senkora · on June 22, 2021

Oh, yeah, definitely a different thing but kinda neat that the name has occurred twice. Sorry for the scare!

emptyparadise · on June 22, 2021

One throwaway line about GAN operating systems now made me want to see a shell GAN. Keypresses as inputs, 80x24 terminal screens as outputs. Could a neural network dream of Unix?

reasonabl_human · on June 22, 2021

This exists via recent NLP models, I’ll see if I can dig up a link…

Edit: https://www.reddit.com/r/linux/comments/mtnld7/programmer_cr...

riveducha · on June 22, 2021

As the original creator of that video, it’s a little sad to see people download and then re-upload the entire video to Reddit.

You can find the original video as well as written commentary on my web page: https://riveducha.onfabrica.com/openai-powered-linux-shell

jallbrit · on June 23, 2021

Wow, what an incredible video and showcase. This really puts GPT-3's power into perspective. I can't wait till the public has access to something that powerful- or maybe I should enjoy not receiving GPT-3 phishing emails in my inbox.

emptyparadise · on June 22, 2021

But this is converting natural language input to commands, right? It's not actually dreaming up the entire shell and the output.

sentdex · on June 22, 2021

I don't see why not. Might be something fun to try tbh.

toxik · on June 22, 2021

I always imagined this for a honeypot, an SSH server that accepts any and all! root/password? You bet!

emptyparadise · on June 22, 2021

Imagine being a pentester, breaking into what looks like a normal Alpine VM, then finding out that it is weird.

whalesalad · on June 22, 2021

GPUs: am I a joke to you? Instead of using them to render polygons, let’s use them to train neural networks that produce models that make them unnecessary. I’m oversimplifying - but pretty wild nonetheless.

Sharlin · on June 22, 2021

Wait, you mean that "GPU" doesn't mean "GAN Processing Unit"? ;)

slver · on June 22, 2021

Well, neural networks run (fastest) on GPUs.

nitrogen · on June 22, 2021

Something I'd like to see is a visualization of subsets of the network's internal state that correlate with simple quantities like compass direction, velocity, position, etc. It'd be really fascinating to see where in the model these things are being learned, whether they are concentrated in a small area or spread out, and whether this is somewhat consistent across different iterations of the model.

ludwigschubert · on June 22, 2021

Me too! In a much simpler setting a former colleague of mine, Jacob Hilton, tried such an exploration for the vision part of a OpenAI CoinRun model. It’s the first part of this paper: https://distill.pub/2020/understanding-rl-vision/

philipswood · on June 22, 2021

The GitHub repo is here:

https://github.com/sentdex/GANTheftAuto/

okamiueru · on June 22, 2021

Can someone explain a bit more on the long term applicability, or maybe other use cases that might be easier to appreciate?

The reason why I ask is that it seems very challenging to generate the training data for such systems. Could someone explain how this can go further than to just replicating X? So, if assuming some creative freedom, could you give an idea of what the long term application of this would be?

NB: please take my questions at face value without thinking I'm implying this isn't cool for what it is. I'm all for people having fun. I'm all for projects not needing to tackle some grander issue.

tiborsaas · on June 22, 2021

In the future we might have a fourth common media format besides pictures, videos and audio: GAN records.

okamiueru · on June 22, 2021

That is an interesting thought. I don't fully understand how though. The main challenge is the training data. If you need to first create the interactive experience... What would the added value be?

mdale · on June 22, 2021

https://openai.com/blog/dall-e/

For example is creating novel combination based on (large) training set. If the network had enough weights on what is realistic looking could create novel game experiences based on a prompt of say a film or book.

4dahalibut · on June 22, 2021

Hey sentdex this is absolutely awesome! Playing with exotic target types like generating games is IMO where the fun is in ml :)

Do you see yourself taking this train of play further?

sentdex · on June 22, 2021

We'd like to try some further GTA stuff, as well as some IRL stuff. Have seen some recent IRL GAN stuff, and it looks super interesting.

There's just something about AI-based environments that is particularly intriguing!

TinkersW · on June 22, 2021

Looks interesting, if very far from practical-- too bad it requires a "DGX" station to train

It seems to flicker/fade things in alot, like the random poles that keep appearing and disappearing, it seems like there is not enough focus on temporal consistency or something?

bruce343434 · on June 22, 2021

If you see the source output image before it was upscaled you noticed the resolution is too low and thin objects "fall between" the pixels. The upscaler then interprets it as air, it seems.

someperson · on June 22, 2021

It should be possible to take arbitrary video training data (whether from a game or real-life) and automatically reconstruct the 3D models of all vehicles in the scene (and the skybox) and "play back" the scene in a video game engine.

This is the direction virtual and augmented reality is headed (Facebook Codec Avatar, and their room reconstruction technology).

junon · on June 22, 2021

This is incredible. Took me a minute to realize this isn't an image transform of some kind.

Really well done.

HerrmannM · on June 23, 2021

Great work! I'm curious about what could be achieved in this space in the future.

I'm curious about why you cannot share the GTA5 mod and collection script? I'm curious about that part too -- obtaining good data is always hard.

Cheers and all the best!

dividuum · on June 22, 2021

Impressive. Makes you wonder if at some point in the future there isn't a game engine any more but tons of training material and you play in a generated dream.

jsiepkes · on June 22, 2021

Certainly impressive. And sure, maybe in a distant future. Though I think this is like one of those things where creating a working prototype that is 75% complete is the "easy" part. The other 25% (which you need for an actual working product) will take forever. Like self driving cards, nuclear fusion, etc.

viraptor · on June 22, 2021

Text dungeon was very much a 75% complete thing, yet it's greatly entertaining in its own right. I would happily play a dream game which just falls apart sometimes.

slver · on June 22, 2021

Maybe not entirely, because just like a dream, the rules of a neural network tend to drift and be somewhat fuzzy.

Unless it's a high-concept game whose very goal is offering you a dream environment.

But I do believe neural networks will get into everything. They're the last missing piece of our compute model.

darepublic · on June 22, 2021

Cool stuff. Looking forward to more realistic NPCs and player decision driven stories in an open world sandbox

boyadjian · on June 22, 2021

WOW, this is awesome. The video gives the impression we are dreaming

black_puppydog · on June 22, 2021

Woah dude!

sudo python3 inference.py?

Really? :D

Randomoneh · on June 22, 2021

I fail to see novelty here. What's the size difference between the model and and all of the 64x32 image training data? If the difference is not significant, you're basically almost just scrubbing a video, right?

sentdex · on June 22, 2021

The GAN model is the game environment. You're playing a neural network. The novelty is no game engine, no rules, just learned how to represent the game and you can play it.

rasz · on June 22, 2021

What he meant is you overfitted the network with video footage. There is no game, just seemingly clever stitching and playback of learned footage

similar concept applied to animations and implemented in a state machine https://www.youtube.com/watch?v=KSTn3ePDt50

and optimized with nn https://www.youtube.com/watch?v=16CHDQK4W5k

sentdex · on June 22, 2021

We had ~100GB of data (and that was gzip compressed data). The final model is 173MB.

It's simply not large enough to have memorized every combo.

eru · on June 22, 2021

Don't think gzipping helped much with video data?

ShamelessC · on June 22, 2021

The first link provided seems to need a very detailed human-provided cost function for specific development needs.

The second one is indeed interesting research and seems to be a combination of the prior learned motion mapping working in tandem with a generative model.

I suppose you could say that the automation of the dataset is considered as "augmentation"; but the difference here is that the dataset is just pixels and inputs rather than all that animation info and simulation data. Yes, a simulation is running; but the GAN only gets the pixels and the input.

There's a similarity there though; you're right. In either case; the explicit goal of the video you posted is to combat runtime constraints of generative models. I'm not certain it's a fair comparison.

The latter video and sentdex's result both seem to generalize to unique scenarios not present in the training set. This may mean they are creating an efficient representation of the underlying data in order to predict future samples more easily than simply overfitting.

The top level comment here is a shallow dismissal and Randomoneh could have answered these questions themselves before throwing out a smug comment like "I fail to see novelty here" when it's at the very least the first large-scale GAN successfully trained on GTA V.

rasz · on June 22, 2021

The first link exposes the trick employed by your model.

>animation info and simulation data

but did your model learn any of that?

>explicit goal of the video you posted is to combat runtime constraints

The trick to motion mapping is feeding a lot of data with accompanying inputs to build an atlas you can reference during playback.

>first large-scale GAN successfully trained on GTA V

Its really cool. The problem I had is in the presentation. I immediately felt insincerity bordering on scamming the audience, because I assume someone working in this field would know how the sausage is made. From the YT clip: "the shadow and reflection works", "modeling of physics works". Do they? or did your model build an atlas of video frames it can play back according to the fed input? Im guessing weather/time of day was locked when recording training data - perfect shadow and constant sun position for a nice reflection. Searching for 1:1 matches of generated output in the training set would be interesting and pretty revealing.

sentdex · on June 22, 2021

> I immediately felt insincerity bordering on scamming the audience

MFW I read this. Jeez man. Model size is 173MB. It didn't just memorize every possible combo.

How the hell you went from our excitement about a fun project we shared on YT to accusing us of "scamming" the audience I really don't know. What a terribly rude and hateful attitude you have =/

uh_uh · on June 22, 2021

Don't take it personal. Commenters on HN are famous for dismissing successful ideas (remember Dropbox?).

I have one question: you mentioned that the training data was 100GB. Was it the same resolution as what is output by the model (ignoring supersampling)?

bobsmooth · on June 22, 2021

The people on this website are terrible sometimes.

slver · on June 22, 2021

I wouldn't call it scamming, but 173MB is not small at all. At the resolution of this model, you can easily fit the entire Titanic movie in 173MB. Maybe even have enough space for audio.

Furthermore no one is saying the model "memorized every possible combo". However imagine you have a set of keyframes (maybe even multiple fragments per frame) and you need to interpolate between them? Not that hard of a task, isn't it.

Models don't care about simulating our "intention" properly. They care about fitting the input in the simplest way possible. Think about a model like a lazy worker merely trying to look like it's working.

None of this makes NN less exciting, but it should inform us you can't go 0 to 60 in one step and hope the NN would have great insight about what it's doing.

We need models that make smaller conceptual jumps, i.e. models that understand 3D space, then models which understand transformations in 3D space, then models which understand citicscape, etc. etc.

mekkkkkk · on June 22, 2021

It sounds like you and others are trying to clarify how this demo doesn't live up to your idealized, subjective expectations. Noone is claiming this to be a revolutionizing or even useful video game engine.

It's a neural network that recreates a limited, yet fully dynamic gameplay segment only based on player input. It's a really neat and fun project.

slver · on June 22, 2021

I think it's quite telling that you point to me about having idealized, subjective expectations and then describe the demo as "limited yet fully dynamic gameplay". It rotates the car to left or right depending on whether you press left or right.

It's super-interesting but it doesn't recreate limited fully dynamic gameplay. It doesn't recreate any sort of dynamic gameplay. That's your idealized, subjective interpretation.

mekkkkkk · on June 22, 2021

The driving seems pretty dynamic to me. Maybe "fully" was a bit hyperbolic, as I can't really justify or quantify what that would entail. On the other hand, saying that it's not dynamic at all seems equally misguided. Also you seem to disregard the "limited" and "segment" qualifiers which was there for a reason.

mscharrer · on June 22, 2021

> However imagine you have a set of keyframes (maybe even multiple fragments per frame) and you need to interpolate between them? Not that hard of a task, isn't it.

Intrestingly, the video artifacts of this model look somewhat similar to those from simple motion interpolation algorithms such as ffmpeg's minterpolate, especially during fast camera motion. https://ffmpeg.org/ffmpeg-filters.html#minterpolate

Edit: I generated an example with strong artifacts. Input: https://mscharrer.net/tmp/lowfps.webm Output: https://mscharrer.net/tmp/minterpolate.webm

sentdex · on June 22, 2021

Memorizing a static succession of frames with nothing actually being dynamic and interactive isn't the same challenge as this.

kristintynski · on June 22, 2021

Accusations of scamming are serious. What evidence do you have? None as far as I can see. This is wrong and should be remedied.

rasz · on June 22, 2021

I feel scammed when practitioner of the art tries to sell me on his model "learning physics of the simulation. Look, it even figured out where to put the shadow".

kristintynski · on June 22, 2021

No one cares how you feel, come with proof before accusations. Otherwise you are just a troll

magic_quotes · on June 22, 2021

Have you seen the video? The author even goes as far as suggesting the technique might useful for (generating?) entire operating systems at https://www.youtube.com/watch?v=udPY5rQVoW0&t=853s. That's just wild.

sentdex · on June 22, 2021

No, that's just false. How about a direct quote?

I suggested there could be a "future where many game engines are entirely or even mostly AI based like this. Or even things like operating system or other programs."

The thought here was just a wondering of what the future might be and if we might have far more AI based programs.

I still think the answer is a strong yes, this is a glimpse into the future. No where did I say GameGAN would be that engine. You're just trying your hardest to hate.

magic_quotes · on June 22, 2021

I'd like my OS being deterministic, thank you.

> You're just trying your hardest to hate.

Manipulative much? I don't hate you (well, so far), you aren't being attacked, I'm just noting what a few informed people here don't like about your video. No, they aren't trolls. And, yes, everyone has different level of tolerance to exaggerations, of course.

sentdex · on June 22, 2021

Odd, pretty sure it was you who misrepresented what I said in attempts to manipulate.

You were also the one who "exaggerat[ed]" my claims. I made a general statement about my thoughts about future AI-based software rather than human-coded.

I still think that's indeed the inevitable future. Doesn't seem like it's remotely outrageous or an exaggerated. I never said GameGAN would be that software, but you seem to want to make that be the case so you can put it down.

What makes you believe neural networks aren't or could not be deterministic? What makes you think NNs could not eventually produce far more robust, reliable, and secure operating systems?

Seems obvious to me, but I guess you're more informed than me :)

90211 · on June 23, 2021

You, like many youtubers, made completely exaggerated claims in your commentary. Your model fits a sequence of inputs to a video frame. But you say "wow look it even models the movement of the sun!". It's pretty absurd.