Those are really useful numbers--I think a lot of it can be chalked up to virtualization, but we should definitely explore more around IRQ pinning for queues. Any good starting points / reading, are you mostly using taskset?
Taskset is fine for the process pinning. Don't forget about hyperthreading, you want to try to keep each thread on each hardware thread. IRQ pinning, see an example script I have:
Because in-house we have a custom version of memcache. We rewrote memcache's slab allocator, and for some use cases, is better at memory efficiency than Redis.