> I know that cross-compiling Linux on an Intel X86 CPU isn't necessarily going to be as fast as compiling on an ARM64-native M1 to begin with
Is that true? If so, why? (I don't cross compile much, so it isn't something I've paid attention to).
The architecture the compiler is running on doesn't change what the compiler is doing. It's not like the fact that it's running on ARM64 gives it some special powers to suddenly compile ARM64 instructions better. It's the same compiler code doing the same things and giving the same exact output.
Some cross-compilation may need some emulation to fold constant expressions. For example if you want to write code using 80 bit floats for x86 and cross-compile on a platform that doesn’t have them, they must be emulated in software. The cost of this feels small but one way to make it more expensive would be also emulating regular double precision floating point arithmetic when cross compiling. Obviously some programs have more constant folding to do during compilation than others.
My understanding is that LLVM already does software emulation of floating point for const evaluation, in order to eliminate any variation due to the host architecture.
Is constant folding going to be a bottle neck? In this particular instance, in the kernel, floating point is going to be fairly rare anyway, and integer constant folding is going to be more or less identical on 64-bit x86 and ARM.
In theory, yeah. In practice, a native compiler may have slightly different target configuration than cross. For example, a cross compiler may default to soft float but native compiler would use hard float if the system it's built on supports it. Basically, ./configure --cross=arm doesn't always produce the same compiler that you get running ./configure on an arm system. As a measurable difference, probably pretty far into the weeds, but benchmarks can be oddly sensitive to such differences.
there's no reason for a cross-compiler to be slower than a native compiler.
if your compiler binary is compiled for architecture A and emits code for an architecture B, it's going to perform the same as a compiler compiled for an architecture A and emitting code for the same architecture A.
Well there's one. If people tend to compile natively much more often than cross-compile, then it would make sense to spend time optimizing what benefits users.
Yes but you probably would make those optimizations in C code and not assembly. The amd64 compiler is basicially the same C code whether or not it's been bootstrapped on armv8 or amd64.
Well to get a little nuanced, it depends on if the backend for B is doing roughly the same stuff as for A (e.g. same optimizations?). I have no idea if that's generally true or not.
Is that true? If so, why? (I don't cross compile much, so it isn't something I've paid attention to).
The architecture the compiler is running on doesn't change what the compiler is doing. It's not like the fact that it's running on ARM64 gives it some special powers to suddenly compile ARM64 instructions better. It's the same compiler code doing the same things and giving the same exact output.