TCG has historically had more of a focus on accuracy than performance. It lifts a lot of guest architectures to a lot of host architectures, and isn't particularly specialized to any given host cpu type. It lifts many instructions to C helpers instead of bothering to jit them. Last I checked it had no vector -> vector jit. It's also not single address mapped - memory IO undergoes indirection, which is expensive. I think Rosetta for example has a shared address space for the guest and host code. Honestly on 64-bit CPUs, especially with pointer authentication on M1, the risk of the guest accidentally messing with host/jit memory is low.