Requiring the caller to put all arguments on the stack isn't "zero cost." For a non-variadic call on ARM64, the first eight parameters (or more, if some are floats) will be passed in registers without ever touching the stack.
On x86-64, the caller also has to set %al to the number of vector registers used for the call, and the compilers I've seen always check %al and conditionally save those registers as part of the function prologue. Cheap, but not "zero cost."
You'll notice how `normal` takes all of its arguments out of registers `x0` through `x7` and places them on the stack for the call to `printf`. And you'll notice how `vararg` plays a bunch of games with the stack and never touches registers `x1` through `x7`. (It still uses `x0` because the first argument is not variadic.)
On the caller side, observe how `call_normal` places its values into `x0` through `x7` sequentially and then invokes the target function, while `call_vararg` places one value into `x0` and places everything else on the stack.
So, no, it looks to me like varargs very much change the calling convention.
On x86-64, the caller also has to set %al to the number of vector registers used for the call, and the compilers I've seen always check %al and conditionally save those registers as part of the function prologue. Cheap, but not "zero cost."