Do you know of any good books or references where I could learn more about these things?
The books that are usually recommended seems to be CUDA-centric and out of date. I'm interested in learning the more general concepts you talk about in your answer, so that I can effectively write e.g. monte carlo simulations on arbitrary GPU hardware. (I don't have an Nvidia GPU!)
> The books that are usually recommended seems to be CUDA-centric and out of date.
The CUDA-ones are usually best, because they're at least with a modern API. The other recommendations I got are the 80s stuff on vector computers and the CM-2.
It turns out that a huge community of high-performance programmers have experimented with all of these concepts in the 1970s, 80s, 90s, and 00s, long before GPUs. All of their concepts still work today on modern GPUs.
Looking up the right keywords: such as "CREW-PRAM algorithms" (Concurrent-Read Exclusive Write, Parallel RAM model) immediately gives you plenty of results for some things. (Ex: I just searched on 'CREW-PRAM DFS' and got: https://core.ac.uk/download/pdf/82490222.pdf).
The key is understanding what the "old word" for GPU was. That's PRAM, the Parallel-RAM model. That's how programmers from the 1970s, 1980s, and 1990s talked about algorithms written for the GPU-style called it back then.
Newer articles / books talk about GPUs directly.
--------------
I'd say the fundamentals are covered in the 1970s, such as "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations" by Kogge, which leads to the development of Prefix sum (as it is called today).
Of course, CUDA code is clearer than a theoretical CREW-PRAM discussion. So perhaps its easier if you just read GPU Compute Gems by NVidia to cover the same material. Still, I find that the 1970s writing is often times better written for a beginner (back then, fewer programmers knew how parallel programming worked, but were clearly more math-heavy than today. So I find that reading old articles like that one helps my personal brain and personal thinking pattern)
---------
Ah right, the recommendation. I'd say "Data Parallel Algorithms" by Hillis / Steele (ACM Communications 1986) is an excellent introduction to general-purpose SIMD compute. It was written for CM-2, an ancient supercomputer that no longer exists, but the PRAM style applies to GPU algorithms today.
Its like 15 pages, but it really opens your eyes to the possibilities. A lot of CUDA stuff is very specific to NVidia GPUs (important specific details: like bank conflicts and shared memory, which you absolutely should learn about... but such details should be studied after you learned the way of parallel-thinking / PRAM model / etc. etc.)
Oh: in case it isn't clear... CUDA is just the most popular current GPGPU programming language right now. There's ipsc, OpenCL, OpenACC, DPC++, SYCL and many others.
They are all closely related to the PRAM-model however. So algorithms study isn't really about learning CUDA-details or whatever, but learning the generic (and cross-language) concepts of parallel programming.
------
So it really doesn't matter if you're reading CUDA, or C-star (1980s code for the old CM-2 supercomputer). They're both PRAM in concept and therefore somewhat compatible in your brain.
It helps to know GPU-quirks for highest levels of optimization (wavefront programming, uniform branches, __shared__ memory), but you can learn these details after learning the generic PRAM stuff known for the past decades.
Hey, dragon, thanks so much for your thoughtful replies. This thread turned into a bonanza of revelations into the future of graphics apis. And I have to concur, cpus are getting so inexpensive and powerful, proprietary renderers (designed to run on farms as well) simply target vector extensions for parallelism.
Regarding learning GPU Architectures and Programming, usually in the Graphics Gems books there is an introductory section devoted to compute. But you are on own regarding streaming, tracing, tuning, multi gpu. All still very much dark arts ;)
Do you know of any good books or references where I could learn more about these things?
The books that are usually recommended seems to be CUDA-centric and out of date. I'm interested in learning the more general concepts you talk about in your answer, so that I can effectively write e.g. monte carlo simulations on arbitrary GPU hardware. (I don't have an Nvidia GPU!)