Clang generates slightly better code than Go, I think, because it fuses the mov and or together into an lea (demoting or to add in the process): https://godbolt.org/z/1Gz5KzrzK
I'm unsure whether there is a perf difference, though, as register-to-register mov may get boiled away to nothing in register renaming.
I'm unsure whether there is a perf difference, though, as register-to-register mov may get boiled away to nothing in register renaming.