Efficient bulk TLB manipulation is also part of the secret sauce behind Azul's pauseless GC.
AIUI they have a custom kernel module to provide more efficient virtual memory operations than the linux kernel APIs can provide.
TL;DR it pays off to invalidate a range of pages per inter-processor interrupt, rather than one page per interrupt, and dtrace can get the numbers to prove it.