Because the expense is not really worth it - even GPU rendering (while around 3/...

dahart · on Jan 2, 2021

> Because the expense is not really worth it

I disagree with this takeaway. But full disclosure I’m biased: I work on OptiX. There is a reason Pixar and Arnold and Vray and most other major industry renderers are moving to the GPU, because the trends are clear and because it has recently become ‘worth it’. Many renderers are reporting factors of 2-10 for production scale scene rendering. (Here’s a good example: https://www.youtube.com/watch?v=ZlmRuR5MKmU) There definitely are tradeoffs, and you’ve accurately pointed out several of them - memory constraints, paging, micropolygons, etc. Yes, it does take a lot of engineering to make the best use of the GPU, but the scale of scenes in production with GPUs today is already firmly well past being limited to turntables, and the writing is on the wall - the trend is clearly moving toward GPU farms.

berkut · on Jan 2, 2021

I write a production renderer for a living :)

So I'm well aware of the trade offs. As I mentioned, for lookdev and small scenes, GPUs do make sense currently (if you're willing the pay the penalty of getting code to work on both CPU and GPU, and GPU dev is not exactly trivial in terms of debugging / building compared to CPU dev).

But until GPUs exist with > 64 GB RAM, for rendering large scale scenes, it's just not worth it given the extra burdens (increased development costs, heterogeneous sets of machines in the farm, extra debugging, support), so for high-end scale, we're likely 3/4 years away yet.

dahart · on Jan 2, 2021

I used to write a production renderer for a living, now I work with a lot of people who write production renderers for both CPU and GPU. I’m not sure what line you’re drawing exactly ... if you mean that it will take 3 or 4 years before the industry will be able to stop using CPUs for production rendering, then I totally agree with you. If you mean that it will take 3 or 4 years before industry can use GPUs for any production rendering, then that statement would be about 8 years too late. I’m pretty sure that’s not what you meant, so it’s somewhere in between there, meaning some scenes are doable on the GPU today and some aren’t. It’s worth it now in some cases, and not worth it in other cases.

The trend is pretty clear, though. The size of scenes than can be done on the GPU today is large and growing fast, both because of improving engineering and because of increasing GPU memory speed & size. It’s just a fact that a lot of commercial work is already done on the GPU, and that most serious commercial renderers already support GPU rendering.

It’s fair to point out that the largest production scenes are still difficult and will remain so for a while. There are decent examples out there of what’s being done in production with GPUs already:

https://www.chaosgroup.com/vray-gpu#showcase

https://www.redshift3d.com/gallery

https://www.arnoldrenderer.com/gallery/

berkut · on Jan 2, 2021

The line I'm drawing is high-end VFX / CG is still IMO years away from using GPUs for final frame (with loads of AOVs and Deep output) rendering.

Are GPUs starting to be used at earlier points in the pipeline? Yes, absolutely, but they always were to a degree in previs and modelling (via rasterisation). They are gradually becoming more useable at more steps in pipelines, but they're not there yet for high-end studios.

In some cases, if a studio's happy using an off-the-shelf renderer with the stock shaders (so no custom shaders at all - at least until OSL is doing batching and GPU stuff, or until MDL actually supports production renderer stuff) studios can use GPUs further down the pipeline, and currently that's smaller scale stuff from what I gather talking to friends who are using Arnold GPU. Certainly the hero-level stuff at Weta / ILM / Framestore isn't being done with with GPUs, as they require custom shaders, and they aren't going to be happy with just using the stock shaders (which are much better than stock shaders from 6/7 years ago, but still far from bleeding edge in terms of BSDFs and patterns).

Even from what I hear at Pixar with their lookdev Flow renderer things aren't completely rosy on the GPU front, although it is at least getting some use, and the expectation is XPU will take over there, but I don't think it's quite ready yet.

Until a studio feels GPU rendering can be used for a significant amount of the renders (that they do, for smaller studios, the fidelity will be less, so the threshold will be lower for them), I think it's going to be a chicken-and-egg problem of not wanting to invest in GPUs on the farms (or even local workstations).

boulos · on Jan 3, 2021

I think you’re right about the current state (not quite there, especially in raw $$s), but the potential is finally good enough that folks are investing seriously on the software side.

The folks at Framestore and many other shops already don’t do more than XX GiB per frame for their rendering. So for me, this comes down to “can we finally implement a good enough texture cache in optix/the community” which I understand Mark Leone is working on :).

The shader thing seems easy enough. I’m not worried about an OSL compiled output running worse than the C-side. Divergence is a real issue, but so many studios are now using just a handful of BSDFs with lots of textures to drive, that as long as you don’t force the shading to be “per object group” but instead “per shader, varying inputs is fine”, you’ll still get high utilization.

The 80 GiB parts will make it so that some shops could go fully in-core. I expect we’ll see that sooner than you’d think, just because people will start doing interactive work, never want to give it up, and then say “make that but better” for the finals.

shaklee3 · on Jan 3, 2021

GPUs do exist with 64+GB of rao, virtually. A dgx2 has distributed memory where you can see the entire 16x32GB of address space backed by nvlink. And that technology is now 3 years old, and it's even higher now.

foota · on Jan 2, 2021

Given current consumer GPUs are at 24 GB I think 3-4 years is likely overly pessimistic.

berkut · on Jan 2, 2021

They've been at 24 GB for two years though - and they cost an arm and a leg compared to a CPU with a similar amount.

It's not just about them existing, they need to be cost effective.

lhoff · on Jan 2, 2021

Not anymore. The new Ampere based Quadros and Teslas just launched with up to 48GB of RAM. A special datacenter version with 80Gb is also already announced: https://www.nvidia.com/en-us/data-center/a100/

They are really expensive though. But chassis and rackspace also isn't free. If one beefy node with a couple GPUs can replace have a rack of CPU only Nodes the GPUs are totally worth it.

I'm not too familiar with 3D rendering but in other workloads the GPU speedup is so huge that if its possible to offload to the GPU it made sense to do it from a economical perspective.

erosenbe0 · on Jan 3, 2021

Hashing and linear algebra kernels get much more speedup on a GPU than a vfx pipeline does. But I am glad to see reports here detailing that the optimization of vfx is progressing.

erosenbe0 · on Jan 3, 2021

Desktop GPUs could have 64GB of GDDR right now but the memory bus width to drive those bits optimally (in primary use case of real-time game rendering, not offline) would up the power and heat dissipation requirements beyond what is currently engineered onto a [desktop] PCIE card.

If 8k gaming becomes a real thing you can expect work to be done towards a solution, but until then not so much.

Edit: added [desktop] preceding PCIE

fluffy87 · on Jan 3, 2021

There are already GPUs with >90GB RAM? DGX-A100 has a version with 16 A100 GPUs, having each 90 Gb.. that’s 1.4TB of GPU memory on a single node.

berkut · on Jan 2, 2021

I should also point out that ray traversal / intersection costs are generally only around 40% of the costs of extremely large scenes, and that's predominantly where GPUs are currently much faster than CPUs.

(I'm aware of the OSL batching/GPU work that's taking place, but it remains to be seen how well that's going to work).

From what I've heard from friends in the industry (at other companies) who are using GPU versions of Arnold, the numbers are no-where near as good as the upper numbers you're claiming when rendering at final fidelity (i.e. with AOVs and Deep output), so again, the use-cases - at least for high-end VFX with GPU - are still mostly for lookdev and lighting blocking iterative workflow from what I understand. Which is still an advantage and provides clear benefits in terms of iteration time over CPU renderers, but it's not a complete win, and so far, only the smaller studios have started dipping their toes in the water.

Also, the advent of AMD Epyc has finally thrown some competitiveness back to CPU rendering, so it's now possible to get a machine with x2 as many cores for close to half the price, which has given CPU rendering a further shot in the arm.

boulos · on Jan 3, 2021

Dave, doesn’t that video show more like “50% faster”? Here’s the timecode (&t=360) [1] for the “production difficulty” result (which really doesn’t seem to be, but whatever).

Isn’t there a better Vray or Arnold comparison somewhere?

As in my summary comment, an A100 can now run real scenes, but will cost you ~$10k per card. For $10k, you get a lot more threads from AMD.

[1] https://m.youtube.com/watch?v=ZlmRuR5MKmU&t=360

shaklee3 · on Jan 3, 2021

What do you mean by a lot more threads? Are you comparing an epyc?

boulos · on Jan 3, 2021

Yeah, that came off clumsily (I’d lost part of my comment while switching tabs on my phone).

An AMD Rome/Milan part will give you 256 decent threads on a 2S box with a ton of RAM for say $20-25k at list price (e.g., a Dell power edge without any of their premium support or lots of flash). By comparison, the list price of just an A100 is $15k (and you still need a server to drive the thing).

So for shops shoving these into a data center they still need to do a cost/benefit tradeoff of “how much faster is this for our shows, can anyone else make use of it, how much power do these draw...”. If anything, the note about more and more software using CUDA is probably as important as “ray tracing is now sufficiently faster” since the lack of reuse has held them back (similar things for video encoding historically: if you’ve got a lot of cpus around, it was historically hard to beat for $/transcode).

shaklee3 · on Jan 3, 2021

The reason I asked is I did a performance trade-off with a v100 and dual epyc rome with 64 cores, and the v100 won handily for my tasks. That obviously won't always be the case, but in terms of threads you're now comparing 256 to 5000+, but obviously not apples to apples.

erosenbe0 · on Jan 3, 2021

Is the high priced 256 thread part that interesting for rendering? You can get 4 of the 64 thread parts on separate boards and each one will have its own 8 channel ddr instead of having to share that bandwidth. Total performance will be higher for less or same money. Power budget will be higher but only a couple dollars a day, at most. But I haven't been involved in a cluster for some time, so not really sure what is done these days.

dahart · on Jan 3, 2021

Yes, this example isn’t quite as high as the 2-10x range I claimed, but I still liked it as an example because the CPU is very beefy, and it’s newer and roughly the same list price as the GPU being compared. I like that they compare power consumption too, and ultimately the GPU comes out well ahead. There are lots of other comparisons that show huge x-factors, this one seemed less likely to get called out for cherry picking, and @berkut’s critique of texture memory consumption for large production scenes is fair... we’re not all the way there yet. But, 50% faster is still “worth it”. In the video, Sam mentions that if you compare lower end components on both sides, the x-factor will be higher.

Narann · on Jan 3, 2021

> There is a reason Pixar and Arnold and Vray and most other major industry renderers are moving to the GPU

The reason is that those softwares need to be sold to many, and a big part of studios are doing advertise and series. GPU rendering is perfect for them as they don't need/can't afford large scale render farms.

About your example, that not honest. It's full of instances and perfect use case for a "Wow" effect but it's not a production shot. Doing a production shot required complexity management on the long run, even for CPU rendering. On this side, GPU is more "constrained" than CPU, management is even more complex.

ArtWomb · on Jan 2, 2021

Nice to have an industry insider perspective on here ;)

Can you speak to any competitive advantages a vfx-centric gpu cloud provider may have over commodity AWS? Even the RenderMan XPU looks to be OSL / Intel AVX-512 SIMD based. Thanks!

Supercharging Pixar's RenderMan XPU™ with Intel® AVX-512

https://www.youtube.com/watch?v=-WqrP50nvN4

lattalayta · on Jan 2, 2021

One potential difference is that the input data required to render a single frame of a high end animated or VFX movie might be several hundred gigabytes (even terabytes for heavy water simulations or hair) - caches, textures, geometry, animation & simulation data, scene description. Often times a VFX centric cloud provider will have some robust system in place for uploading and caching out data across the many nodes that need it. (https://www.microsoft.com/en-us/avere)

And GPU rendering has been gaining momentum over the past few years, but the biggest bottleneck until recently was availabe VRAM. Big budget VFX scenes can often take 40-120 GB of memory to keep everything accessible during the raytrace process, and unless a renderer supports out-of-core memory access, then the speed up you may have gained from the GPU gets thrown out the window from swapping data

pja · on Jan 2, 2021

As a specific example, Disney released the data for rendering a single shot from Moana a couple of years ago. You can download it here: https://www.disneyanimation.com/data-sets/?drawer=/resources...

Uncompressed, it’s 93Gb of render data, plus 130Gb of animation data if you want to render the entire shot instead of a single frame.

From what I’ve seen elsewhere, that’s not unusual at all for a modern high end animated scene.

berkut · on Jan 2, 2021

To re-enforce this, here is some discussion of average machine memory size at Disney and Weta two years ago:

https://twitter.com/yiningkarlli/status/1014418038567796738

lattalayta · on Jan 2, 2021

Oh, and also, security. After the Sony hack several years ago, many film studios have severe restrictions on what they'll allow off-site. For upcoming unreleased movies, many studios are overly protective of their IP and want to mitigate the chance of a leak as much as possible. Often times complying with those restrictions and auditing the entire process is enough to make on-site rendering more attractive.

cubano · on Jan 3, 2021

Did you really just say that one frame can be in the TB range??

Didn't you guys get the memo from B. Gates that no one will ever need more than 640k?

KaiserPro · on Jan 2, 2021

because GPUs in datacenters are expensive.

Not only that, they are massive, kickout a whole bunch of heat in new and interesting ways. worse still they depreciate like a mofo.

the tip top renderbox of today is next years comp box. a two generation old GPU is a pointless toaster.

ced · on Jan 3, 2021

> while around 3/4 x faster than CPU rendering

My understanding is that for neural networks, the speedup is much more than 4x. Does anyone know why there's such a difference?

erosenbe0 · on Jan 3, 2021

Sure. Training neural nets is somewhat analogous to starting on the top of a mountain looking for the lowest of the low points of the valley below. But instead of being in normal 3d space you might have 1000d determining your altitude, so you can't see where you're going, and you have to iterate and check. But ultimately you just calculate the same chain of the same type of functions over and over until you've reached a pretty low point in the hypothetical valley.

OTOH, Vfx rendering involves a varying scene with moving light sources, cameras, objects, textures, and physics. Much more dynamic interactions. This is a gross simplification but I hope it helps.