Indeed, I ought to have mentioned the cost of cache misses post-context switch too. It too is working set size dependent, but the relationship looks different from the cost of TLB misses. However, my understanding is that an inter-task thread switch would also not cause cache invalidation (there's no need; the address space remains the same).
There's also the cost of cache eviction. Not something you have to manually manage, but it's a cost you pay nonetheless. Maybe.