And for those of us who write C++ there's always Intel's Threading Building Blocks. (Not sure if any other language has something equal in power and convenience.)
TBB is based on the use of nested threads. We rely on much of what TBB has to offer, which includes concurrent containers and flowgraphs. Here's a nice overview by Mike Voss of what makes TBB different from OpenMP and C++ standard threading: https://software.intel.com/content/www/us/en/develop/article....