Most of the magic is in the 475 MB model file downloaded through the Bash script...

dogboat · on Dec 12, 2024

And probably like 100Gb of training data or whatever :). HN had a heady debate at least once about if that training data is source or not.

Sharlin · on Dec 12, 2024

I guess the training data should be called assets. Analogously to how you can download the Quake source, and build it and get a working executable, but without the (non-open) assets you cannot replicate what you get by buying the game.

fluoridation · on Dec 12, 2024

By that analogy, the executable code would be the game engine, the model that inference is done on would be the assets, and the training data would be... I guess the PSDs and high-poly models that get compressed and simplified to turn them into game assets.

Sharlin · on Dec 12, 2024

It depends on whether the analogue of playing a game is training or inference. If the former, then the training data are the assets, the training software is the game, and the resulting model is the game experience (or if you want something more concrete, savegames and recorded play I guess).

But yeah, if gameplay equals the use a trained model, then the model is an asset bundle (the pak0.pak if you will) and training data is the original models, textures etc, and the training software is all the programs that are used in the asset production pipeline.

fluoridation · on Dec 12, 2024

Well, consider that training is a batch processing step, while inference is the that's interactive. So training is more analogous to compilation than to gameplay.