All of them- it adds up. Calling conventions are a perfect place for this kind of microoptimization, because they apply pervasively and (assuming you want better error handling support) without additional change to program source.
The same reasoning applies to putting effort into register allocators, or switching from setjmp/longjmp to table-based unwinding, etc.
Small micro optimizations do not add up, especially not for something like error handling that does not concern most operations to begin with. Something like this error handling strategy clearly has its own cost in complexity of implementation, such that the whole thing will collapse under its own weight before you even notice a speed up.
You need to make sure that you keep the size and complexity of the language and its specification within reasonable limits. So you can't just add "all of them" with a blanket statement that they will add up.
The same reasoning applies to putting effort into register allocators, or switching from setjmp/longjmp to table-based unwinding, etc.