I came across this trying to learn more about CUDA and OpenCL. Specifically I was trying to find information on bending GPUs to be useful for actor model... zombies were an awesome bonus!
It's not really suited to CUDA/OpenCL; for the purposes of this discussion, we can treat the two as the same because the execution model of actual kernels is almost identical. The problems are
- the size of the grid is fixed at kernel launch time, so you can't arbitrarily spawn more work
- the completion of one block cannot depend on the execution of another block, which means inter-block communication within a kernel is forbidden
So you could do some sort of block-local actor operation then do an exchange of the halo cells between blocks in another kernel (the only global synchronization point allowed in CUDA/OpenCL), but that seems really painful, especially if actors are sparse within your grid. A work queue approach would probably work better; you'd certainly get better utilization and load balancing than trying to spatially partition a large grid.
While extremely limited, I thought shared memory enabled interblock communications...? It's important to note I have no applications for this, just researching.
No, shared memory is only for intra-block communication. __syncthreads() only ensures that every thread in a block is at a particular point rather than every block in a grid.
Take Fermi, for example--you can potentially have 128 blocks running concurrently (8 blocks per multiprocessor, 16 multiprocessors on GF110), but you can launch a grid of 65535x65535 blocks in a single kernel. As a result, if you try to do arbitrary global synchronization, you'd have a state explosion (PDF: http://www.gdiamos.net/papers/stateExplosion.pdf ). The best way to solve a problem with significant interaction between data elements is to use a persistent work queue (as described in PDF: http://www.tml.tkk.fi/~timo/publications/aila2009hpg_paper.p... ).
You can't go wrong with zombies: the public loves them.
It is necessary to initialize all variables. This will cause a run time error.
This probably should read: 'accessing uninitialized variables causes a runtime error'.
It is necessary to check your memory indexes for all data structures, if they are wrong you will have run time errors.
What would be a 'memory index'? An index into an array? Yes indeed, an out-of-bounds array index will cause an error sooner or later. If your language doesn't check array bounds, like C, the bugs will be subtle and insidious.