Huh, would be interesting to see some non-tailscale benchmarks of this. Assuming the kernel impl is actually optimized it should be theoretically impossible to exceed the performance with userland wg?
I ran the same benchmarks they listed here[0], and did some practical tests. As of a week after the article being written, Tailscale was faster than kernel wireguard.