Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GPU Ray Tracing the Wrong Way (joshbarczak.com)
134 points by ingve on Sept 4, 2016 | hide | past | favorite | 11 comments


Makes you wonder how many cool things people would have written for the native Intel GPU instruction set if there existed supported interfaces for it. All the programming information has been out there in Intel-provided open source drivers for many many years.


There is an assembler in the intel-gpu-tools package [1].

[1] https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/


I don't think that person wrote his own assembler, so there must be existing assembler toolkits for GEN already.

Some googling finds https://software.intel.com/en-us/articles/introduction-to-ge...


He does mention writing an assembler called HAXWell, and there are code snippets of it in the article.


Not to be pessimistic, but that was a lot of effort to get what looks like ~25% improvement over just using the CPU. Is this a typical performance profile for doing ray tracing in OpenGL? I expected tens of times faster than a CPU on commodity hardware.


> Is this a typical performance profile for doing ray tracing in OpenGL?

Ray tracing on the GPU is usually a lot faster than on the CPU. The problem is that the post is comparing the performance of a Intel Core i3-4010U with the performance of that CPU's integrated GPU, an Intel HD 4400.

Some benchmarks: A GeForce GTX 1080, as a current high-end GPU, has a passmark score of 12,618, a Intel HD 4400 has a passmark score of 546. In the 3DMark11 benchmark, a GTX 1080 reaches 24390 points, the used Intel HD 4400 reaches 740 points.

So what we're really seeing is that using a pretty bad GPU, designed by a CPU manufacturer, is barely faster than using the CPU. But using that GPU as if it was a CPU is notably faster than using it via a graphics shader interface.


I quibble with your label of "bad GPU," though the relative magnitude of resources and performance is right. Modern GEN is actually a good-to-great GPU architecture (an assessment widely shared in the GPU architecture community today), it's just not given many resources compared to a large discrete GPU, especially in lower-end SKUs like this.

Work like this is interesting because it digs into ways—at the architectural level, well below the standard programming models of OpenGL/D3D/OpenCL—that GEN is a potentially more efficient general purpose programming target than NVIDIA architectures, hamstrung by a commitment to a programming model (OpenCL) which is strongly tied to an NVIDIA-style execution model.


> Is this a typical performance profile for doing ray tracing in OpenGL? I expected tens of times faster than a CPU on commodity hardware.

SIMD instructions on modern CPUs are getting really good, within the ballpark of GPU speed in some cases. Here's the classic paper [1] on this, and the results have gotten even better since then with AVX2, etc.

[1]: http://sbel.wisc.edu/Courses/ME964/Literature/LeeDebunkGPU20...


The device he is targeting is a very tiny integrated GPU, with only as much bandwidth as the CPU. When people move work to GPUs, they typically buy something at least an order of magnitude beefier to run it on. The current top of the line GPU you can buy has >20x the bandwidth and >80x the FP throughput as his machine.


This was just a cool hack for fun, but there's another aspect to it: power efficiency.

It might be getting 25% more throughput but using a GPU instead of a CPU may be consuming 2x to 10x less power. For some specialized use cases, even the difference can be much bigger.

He probably didn't measure it but this is something that people writing software for embedded system-on-chips (SoC) think about often (this includes mobile apps, look at the number of reviews with mentions about battery life in app stores). It's very important as battery life and thermal throttling make a lot of difference in the mobile/embedded/automotive/aerospace industries.


It's still really cool just for the low-level hardcoreness!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: