Indeed, I ought to have mentioned the cost of cache misses post-context switch t...

Indeed, I ought to have mentioned the cost of cache misses post-context switch too. It too is working set size dependent, but the relationship looks different from the cost of TLB misses. However, my understanding is that an inter-task thread switch would also not cause cache invalidation (there's no need; the address space remains the same).