It doesn't sound to me like the engines you dealt with use ECS, which are usually resolved with a job system (your work units and functors), but correct me if I'm wrong.
The good job systems I've dealt with have their dependencies in the functors. So you "wait" on a job to finish, which is really a while loop that plucks and executes other jobs while the dependency job hasn't finished. This kind of job system is nice to deal with as they are generally low overhead which means all threads (processes really) are generally saturated with work at all times.
I don't really remember any global state with contention because that's generally very very slow, but maybe there were bits of our gameplay code I'm not aware of.
The ECS concerns don't really relate to threading concerns.
I have worked with and without ECS systems both with and without good threading models. ECS writes do create possible issues if write-locks need to be acquired but that isn't usually so big of a deal.
In the "you're still going to have to wait for something" sense, sure. But the reason ECS exists is because the industry had to change our architecture when we moved to many core CPUs to take advantage.
I'm battling to understand what you want then, sorry. The systems that you say you would like to use (discrete jobs with dependencies) are the kind of systems the industry has been using since the advent of data-oriented architecture, which includes ECS. That is, a job worker process per core plucking off work and doing it.
In the engines I've dealt with, we don't usually have write locks, instead preferring copies of "last frame data" and "next frame data". And all our "read locks" are waits for jobs. Our game code is generally single threaded, but the main loop pretty much just kicks off and waits for jobs.
I guess what is a good threading model to you?
(As a side note I've worked on projects that use ECS on a single core and they still confer benefits there even though that's not what they were invented for)
The good job systems I've dealt with have their dependencies in the functors. So you "wait" on a job to finish, which is really a while loop that plucks and executes other jobs while the dependency job hasn't finished. This kind of job system is nice to deal with as they are generally low overhead which means all threads (processes really) are generally saturated with work at all times.
I don't really remember any global state with contention because that's generally very very slow, but maybe there were bits of our gameplay code I'm not aware of.