I'm not sure I understand this. Even if you buy the idea that "[y]ou can almost always use a simple locked data-structure for synchronization fairly painlessly," how is the choice of granularity anything more than the choice of what and where to lock? The choice of granularity is a response to the nature of the synchronization problem.
I think that he is correct in general, but his argument can be better constructed. A simpler way to illustrate the issue is: any shared memory model can be mapped to a message-passing model (in parallel computing) with equal efficiency. Since message-passing model is fairly easy and intuitive, the concern all boils down to find an optimal parallel solution to a certain problem. That step is really hard.
Message passing is easier than using shared-memory correctly, but ensuring that a non-trivial message passing implementation doesn't have any deadlocks is not what I would consider "fairly easy".
Message passing circumvents locking, but not necessarily synchronization and I would not describe using it as easy. The difficulty in concurrency is maintaining an understanding of all possible states at any given point and message passing doesn't fix that.
tl;dr you can deadlock in occam