I was working with a small library that i implemented (with help) on a grad school research project. We were experimenting with various low level techniques/approaches to how one would implement an operating system for a 1000+ core chip. We were experimenting with memory management and scheduling designs. In order to test our designs we wrote a number of toy benchmark programs. Basically the typical set: merge_sort, fft, mat_mult, kmeans.... We ended up with a programming model that is not disimilar to the cilk model, though without the compiler support. So we were manually managing continuation scheduling with latches on atomic variables. This was a somewhat annoying thing to do, but we were able to make it work. I looked at cilk at the time and it definitely would have been nice to have compiler support but i don't think it would have fundamentally changed the way we implemented our algorithms.
If you're parallelizing algorithms with divide-and-conquer behavior (recursive algorithms, or anything that forms a tree of tasks and sub-tasks), I think it's a very natural form of parallelism.
I did something similar as a C++ library for my Master's: http://people.cs.vt.edu/~scschnei/factory/ I felt it was an intuitive way of expressing task parallelism, but if the dependent tasks don't operate on strict subsets of each other's data, I agree it's not much help. If that's the case, then you've successfully expressed the parallelism, but the larger problem remains: synchronized access to data structures.
Of course, transactional memory could help there. (Hey, full circle.)