That's unreal. On what kind of graphics hardware, though? Seems like it probably offloads most of the work on GPU whereas we'd have had to do most of it in software on HW weak enough that 4KB size actually mattered. And probably not achieve this demo.
Edit: I'm wrong about the two triangles. From the .nfo-file:
for those wondering, this a (too) low density flat mesh displaced with
a procedural vertex shader. there arent any texturemaps for texturing,
instead texturing (and shading) is defferred and computed procedurally
in a full screen quad. this means there is zero overdraw for the quite
expensive material at the cost of a single geometry pass. then another
second full screen quad computes the motion blur. camera movements are
computed by a shader too and not in the cpu, as only the gpu knows the
procedural definition of the landscape.
Thanks for detailed response. I figured it mostly did GPU stuff. So, real computing necessary here is a massively-parallel chip with generic and custom hardware with a bunch of memory plus a regular core using 4KB on other end. I think a more interesting challenge would be to force use of a subset of GPU functions or memory plus tiny memory on CPU side. I don't follow demoscene close enough to know if they subset GPU's like that. Idea being making them run closer to the old Voodoo or pre-GeForce GPU's to see just how much 2D or 3D performance once could squeeze out of it.
Tricks could have long-term benefit given any emerging FOSS GPU is more likely to be like one of the older ones given complexity of new ones. I'd clone one like SGI's Octane ones they used to do movies on with mere 200MHz processors. Meanwhile, similar tricks might let one squeeze more out of the existing, embedded GPU's in use. Maybe subset a PC GPU in demoscenes like one of the smartphone GPU's. Yeah, that's got some interesting potential.
You seem to think that GPU programming is somehow easy. You should try it and see what you think.
Yes, there is massive amount of power available but it's not easy to use effectively. You need a different mental model how things work, there's very little shared state and all the algorithms used have to match the model of computation.
Using the GPU almost exclusively, generating everything procedurally is a massive accomplishment and much more difficult than "normal" CPU+GPU programming or using just the CPU.
I do not share your view that this would be somehow less impressive because it uses the GPU.
I used to do GPU programming. Brief foray into it for game programming plus a then-new field called "GPGPU" pushing its limits. Think I implemented some crypto or physics stuff on one. I've followed some of the recent efforts.
My points of comparison are what they're doing vs what it's designed to do with what vs what other people do with that and other hardware. It looks great with lots of efficiency. I'll give them that. It's just way less impressive to me given they're using a powerful graphics card to mostly do what it's designed to do plus their innovation.
Pre "GPGPU" era of mostly fixed function 3d accelerators is hardly comparable to modern programmable GPUs.
> It's just way less impressive to me given they're using a powerful graphics card to mostly do what it's designed to do ...
This demo isn't at all what the GPU is "designed to do". The all-procedural graphics is way different from drawing artist-generated 3d models from memory while being orchestrated by the CPU. While it is more commonplace today, this demo was pioneering work in "all GPU" procedural graphics.
"Pre "GPGPU" era of mostly fixed function 3d accelerators is hardly comparable to modern programmable GPUs."
Which people used to do things they weren't designed for at all in so-called GPGPU work. The results defaulted on really, clever work. It's why I brought it up.
"The all-procedural graphics is way different from drawing artist-generated 3d models from memory while being orchestrated by the CPU. While it is more commonplace today, this demo was pioneering work in "all GPU" procedural graphics."
This is where I likely slipped up. I forgot how old this one was. I retract that claim then.
From the point of view of the hackers who programmed Spacewar on the PDP-1, the C64 is special purpose hardware with a powerful graphics card designed to make it trivial to implement Space Invaders.
The .exe is 4K (it has been compressed using Crinkler), not the application's RAM requirements. The game .kkrieger for example is a 96K .exe, but uses several hundred MB of RAM when run.
Also, the strict size requirements can interfere with execution speed. From the .nfo again:
believe it or not, this was running at 30 fps in a gefoce 7900 at some
point, but size optimizations forced us to ask you for a pretty decent
graphics card, like a geforce 8800gtx or hd4850. please, make sure you
have d3d9_33.dll somewhere there. also, you only need windows xp.
Oh yeah, I forgot about that. I wonder what this one's runtime in RAM is. Regarding GPU quote, that's exactly the sort of thing I'm talking about. It's sort of a cheat where a massive amount of resources are used in one place to reduce a tiny amount in another. An impressive optimization requires little to no extra resources in B when optimizing A. There's some types that straight-up can't seem to have that tradeoff. Yet, the more constrained demo scenes were forced to figure out a bunch of them that worked.
So, I think there's potential for GPU subsets or CPU/GPU tradeoffs to make for interesting opportunities for people to show off brilliance.
>Regarding GPU quote, that's exactly the sort of thing I'm talking about. It's sort of a cheat where a massive amount of resources are used in one place to reduce a tiny amount in another.
Since the demo was originally entered in the 4K competition at the Breakpoint 2009 demo party, it had to run on the computer designated to run the competition's entries. So it's not like it could require an arbitrarily powerful GPU.
Fair enough. The spec requirements I'm mentioning would apply to people setting competition requirements more than the authors. The authors should of course work within the constraints for any particular competition. They can still try my challenge on the side.
" Intel Core2Quad at 2.66GHz, 2GB of RAM, with a NVidia GeForce 295 GTX with 2x896MB of VRAM. "
Double CPU and more GPU than what I'm writing this on but half the RAM. Beefy indeed. .exe size is still impressive and all given what they're doing.
That's the thing, though.... If this were an arbitrarily sized demo, I (and probably most people) would agree with you about the GPU stuff. But it's not arbitrarily sized, it's all in 4K. And it's from 2009.
Programming is all about finding and exploiting ways to cheat.
I remember overhearing a conversation in the Sun cafeteria about how the Aviator flight simulator only had one-half of a 3d model of the airplane, and it just reflected it to get the other half. They complained that was cheating, but that's just how it is!
Oh sure. It's one approach. We have to rate the cheats or honest work somehow. I think one way is to look at both what's produced with what type and number of resources are utilized. The constraints each provide plus what's achieved with them vs some baseline might be a good example. Baseline maybe determined after first round of submissions.
Btw, I'd probably have left off Space Invaders for exact reason you mentioned. Curious to know what you find to be most impressive demo on that system, though.
You wanted optimized code size and optimized performance?
I mean, sure, but think about how big 4KB is, the tricks that are being used to create the scenes are crazy hacks using default Windows sound files and literally anything the executable can reference on the cheap.
Procedural content generation is really expensive (in general), but that's the beauty of it. You find a way to abstract the content into an algorithm, and then you can reduce the size of the assets, but you pretty much always need to pay the price somewhere.
But hey, I understand the sentiment, I wish Slack didn't consume 2 GB of RAM on my machine.
"Being allowed to use 300MB of RAM doesn't strike me as very limiting."
BOOM! I knew it was going to be huge. That's a beefy GPU + 300MB in RAM + pregenerating. I'd have... made sacrifices to have that even in the Half-Life 1 days. :)
The wink face makes it seem like you think this is easy because using a GPU to execute the program is allowed. No?
Edit: just read your other comment about real challenges in the C64 subset of the demoscene. That's like "You set a record in a 1600m race? For a real challenge, set a record in a marathon." It's just arbitrarily moving the totally legitimate goalposts to a different challenge because you prefer it.
>How much harder would it be if on a simple GPU from the late 90's
It would be impossible since pixel shaders didn't exist until the 2000's ;-)
As for software rendering: Since a pixel shader is essentially a program executed for every pixel, it's trivially portable to the CPU: Just turn it into a function and call it for every pixel on the screen. Making it fast is another matter altogether though.
Nitpicking: in the offline rendering world RenderMan had shaders ca. 1990[1], and graphics hackers got around to compiling those for research GPUs in the 90s too[2]. (Hardware had programmability equivalent to current shaders early as well [3], but no compilers for fancy shading languages)
That's some neat stuff. Especially PixelFlow. It had some clever, architectural decisions in terms of memory and computing primitives. Such schemes are already re-appearing in deep learning chips with old work like this maybe having some ideas waiting to be re-discovered.
"It would be impossible since pixel shaders didn't exist until the 2000's ;-)"
Lmao. You got me there.
"Just turn it into a function and call it for every pixel on the screen. Making it fast is another matter altogether though."
I was imagining it took up many MB of memory and massive cycles even on a multicore CPU. Suddenly, one faces tough decisions about organization, resolution, primitives, techniques used, algorithms, and so on. Gets really, really hard to make tiny and fast stuff without that GPU doing heavy lifting. :)
A soft renderer wouldn't fit in 4096 bytes, too. The overwhelming preference of the demoscene when doing PC filesize compos is to lean on OS provisions in order to free up space for more algorithms. Hence you have demos that use files in C:\Windows as source data. Likewise, you have demos for older computers that require aftermarket RAM upgrades and employ preprocessing techniques that require modern computing resources. In unrestricted compos modern game engines get employed these days, too, and while many of those entries suffer the downside of having a low entry bar, good work has been made too.
Pointing at the GPU as a particular cheat or a make-easy button is not relevant to the conversation, in this light. Having a Gravis Ultrasound was also a cheat back in the day ;) It's all fairly arbitrary stuff, and in the end, the point is to present something cool running on the hardware and within the nominal restrictions, even if you get tricky to do so.
"It's all fairly arbitrary stuff, and in the end, the point is to present something cool running on the hardware and within the nominal restrictions, even if you get tricky to do so."
Another good, detailed perspective on it. Appreciate it. I'll especially agree with the part I quoted. :)
Response to edit: more like they couldn't pull it off so they asked people to buy a better graphics card. That's in their own race. Then I pointed out doing graphics operations, mostly rendering, almost entirely on a graphics card designed for that was barely an accomplishment vs stuff like in C64 demoscene. .kkrieger had me way more impressed due to all the elements involved vs size. So, I suggested subsetting or constraining the graphics card so its hundreds of millions of transistors don't just hand people their victories. Plus allow more creativity.
I have nothing but props for that one. A true, full-stack or whole-system coder in the way the term should be used. He's also about halfway to Frenchgeek's grand challenge. Maybe we need to take up funding for him to put it all on an ASIC at 0.35 micron.
I'm actually working toward that challenge. Well, that plus synthesis, verification, and analog tech to create it. Glad we agree on high end for a demo challenge. :P