I have not only implemented parallel decompression but also random access to offsets in the stream with https://github.com/mxmlnkn/pragzip I did some benchmarks on some really beefy machines with 128 cores and was able to reach over 10 GB/s decompression bandwidth. This works without any kind of additional metadata but if such an index file with metadata exists, it can double the decompression bandwidth and reduce the memory usage. The single-core decoder has lots of potential for optimization because I had to write it from scratch, though.