What do you mean by depth first vs breadth first? Naively I would assume task sc...

StefanKarpinski · on July 24, 2019

Consider a machine with N cores running a program which spawns N parallel tasks, each of which in turn spawns N parallel tasks. Which subtasks do the cores work on initially? If it's the first subtask of each top-level task, that's breadth-first scheduling. If it's all N subtasks of the first top-level task, that's depth-first scheduling. The latter is better for high-performance computing because it tends to be the lower level libraries that are really carefully tuned to make optimal use of the machine (think BLAS/FTT/parallel sum/prefix sum). Previously it was unknown how to implement this kind of scheduling efficiently. See this talk by Kiran Pamnany for more details:

https://www.youtube.com/watch?v=YdiZa0Y3F3c

dwohnitmok · on July 24, 2019

Ah I see what you mean! I really hate to pile on with the Haskell observation but the 1995 paper by Blelloch that is mentioned in that talk is exactly the heart of GHC's data parallelism as well (under the name nested data parallelism).

See slide 8 here: https://www.microsoft.com/en-us/research/wp-content/uploads/...

Later versions of Blelloch's language are referred to in other parallel Haskell papers.

FWIW I think this is nonetheless fantastic news for Julia! It's certainly poised to do great work in areas that I doubt will get any Haskell traction anytime soon (if ever) for good reasons.

EDIT: Interesting, I think Julia is taking a coarser grained approach here that breaks down a bit for irregularly balanced execution trees, but relies less on fusion (i.e. doesn't use fusion at all) which is nice. I'm curious to see how this particular implementation of the "depth first" idea goes!

EDIT EDIT: Actually I'm not sure how much fusion is necessary for GHC's approach... Time to investigate more closely. Also call out to anyone more familiar with Data Parallel Haskell to chime in. I'm getting out of my depth.

StefanKarpinski · on July 25, 2019

I'm not surprised that Haskell supports this kind of parallelism, although I wasn't aware of it previously. After all, if you're going to the trouble of being that pure, why not be parallel?