It's a little impossible to show the difference between using popcnt and the expanded emulation code on a platform that doesn't support popcnt.
It doesn't change the fact that the actual program itself was being complied on a platform that doesn't support popcnt, so no amount of compiler arguments would fix that.
https://gcc.godbolt.org/z/qGzWo39b6
Which is an x86-64 sample