Doesn't every major cryptosystem have multiple ciphersuites, though?
There's things like SSL, SSH and GPG, truecrypt, bitlocker, /etc/passwd, ntpsec - even git is trying to upgrade their hashes from SHA1 to something longer. There are only a handful of exceptions, like TOTP.
Isn't it a must-have feature? Or has the feature become less important than it was 25 years ago when those protocols were being designed?
Yes, and every one of those major cryptosystems has been a debacle, in large part because of the negotiations imposed by ciphersuites. It is not a must-have feature; it's a feature cryptography engineering best practice is rapidly beginning to recognize as an anti-feature. See WireGuard for an example of the alternative: you version the whole protocol, and if some primitive you depend on has a break, you roll out a new version --- which, historically, you've effectively had to do anyways in legacy protocols.
If you have multiple WireGuard versions, in a migration setting, you also need to do some negotiation at the start, no? Wouldn't that be potentially vulnerable to downgrade attacks as well?
So the migration looks like "upgrade the client or you won't be able to connect to the server any more"? What if you use the client to talk to multiple servers, some that use the old version, some that use the new version? Maybe via a config variable adjustable per server? Then you do out of band version negotiation, and you might get away with this in the VPN setting, where entering arcane config vars is commonplace, but not in e.g. the TLS setting.
I guess that's thanks to the fact that WireGuard is a new system and new systems have little legacy bloat. Maybe the WireGuard author had golden hands, and the system is perfect, and indeed it it is quite good, but I think instead that WireGuard will eventually require a new version. Then one such solution will have to be chosen.
No, it's pretty widely recognized that WireGuard is in a sense a repudiation of "agility". You can look at, for instance, the INRIA analysis/proof paper to see how a bunch of disinterested cryptographers describe it: "many new secure channel protocols eschew standardisation in favour of a lean design that uses only modern
cryptography and supports minimal cryptographic agility."
If you want to say "minimalist agility is good and you're just saying maximalist agility is bad", that's fine, we're just bickering about terms. But that's pretty obviously not what Schneier is talking about.
All the works I've read of Schneier have given me the impression of the above definition, "support multiple cryptographic primitives and do not be overly coupled to a single primitive."
"The moral is the need for cryptographic agility. It’s not enough to implement a single standard; it’s vital that our systems be able to easily swap in new algorithms when required."
Do you have a link to something that in your mind represents what Schneier is talking about?
A modern cryptosystem wouldn't be designed to swap in new algorithms; it would pick a single set of algorithms and constructions, and version the whole system. Which is how WireGuard works: you can't run AES WireGuard, or WireGuard with the standard P-curves.
There's things like SSL, SSH and GPG, truecrypt, bitlocker, /etc/passwd, ntpsec - even git is trying to upgrade their hashes from SHA1 to something longer. There are only a handful of exceptions, like TOTP.
Isn't it a must-have feature? Or has the feature become less important than it was 25 years ago when those protocols were being designed?