The first problem is that the error-return-pointer-argument approach must go through memory, because the caller can request that the return value go anywhere at all.
The second problem is that existing calling conventions for `struct { bool tag; union { .. }; }` put everything in memory anyway, using a hidden pointer argument. Further, there's no way to put this type in the standard library because C doesn't have generics.
The new implementation can put that single-bit tag in the CPU's carry flag where it has dedicated branch instructions and doesn't interfere with other values. It can leave the actual return value in a register without any kind of union aggregate lowering.
So far this is all just calling convention tweaks, and could be done by pattern-matching user-defined tagged unions, but building it into the language a) makes it possible to standardize its semantics and connect it to platforms' C ABIs so other languages can also participate and b) makes it far simpler to implement and use so it's actually likely to be adopted.
Why would it matter that the return-pointer write goes through memory? It's a handful of cycles. I'm curious what kind of function you envision that is so short-lived AND needs tricky error handling AND is extremely performance-sensitive.
All of them- it adds up. Calling conventions are a perfect place for this kind of microoptimization, because they apply pervasively and (assuming you want better error handling support) without additional change to program source.
The same reasoning applies to putting effort into register allocators, or switching from setjmp/longjmp to table-based unwinding, etc.
Small micro optimizations do not add up, especially not for something like error handling that does not concern most operations to begin with. Something like this error handling strategy clearly has its own cost in complexity of implementation, such that the whole thing will collapse under its own weight before you even notice a speed up.
You need to make sure that you keep the size and complexity of the language and its specification within reasonable limits. So you can't just add "all of them" with a blanket statement that they will add up.
The second problem is that existing calling conventions for `struct { bool tag; union { .. }; }` put everything in memory anyway, using a hidden pointer argument. Further, there's no way to put this type in the standard library because C doesn't have generics.
The new implementation can put that single-bit tag in the CPU's carry flag where it has dedicated branch instructions and doesn't interfere with other values. It can leave the actual return value in a register without any kind of union aggregate lowering.
So far this is all just calling convention tweaks, and could be done by pattern-matching user-defined tagged unions, but building it into the language a) makes it possible to standardize its semantics and connect it to platforms' C ABIs so other languages can also participate and b) makes it far simpler to implement and use so it's actually likely to be adopted.