The problem is deeper than multichannel support, which is poorly supported everywhere except Pro Tools. Even that is fairly deep, Nuendo relies on VST3's model of channel set configuration which is incapable of representing the kinds of multichannel configurations that are in vogue for spatial audio rendering, like you'd find in VR.
But even without leaving stereo, DAWs are already really bad. Think about how a pan or balance control works, instead of representing the perceptual localization of a sound they ask you to control the gain feeding two channels, left/right. The semantics of those two channels change after rendering, which makes it difficult to convey the same intent of a mixer/producer to the audience using that interface. Compare a stereo mix on near field monitors positioned perfectly, vs reference headphones, vs a car, vs a club system... the pan/balance control you set in your DAW changes the localization of the sound depending on ultimate render target, whereas what you need for spatial audio is for the localization to be independent from those targets.
The solution is object based audio, which is not perfect and a misnomer 90% of the time, but even that is because of how terribly DAWs handle spatialization. And it's not something you can really control with a plugin without hacking the mixbus configurations inside a DAW, and you can't propagate position down to other plugins.
But even without leaving stereo, DAWs are already really bad. Think about how a pan or balance control works, instead of representing the perceptual localization of a sound they ask you to control the gain feeding two channels, left/right. The semantics of those two channels change after rendering, which makes it difficult to convey the same intent of a mixer/producer to the audience using that interface. Compare a stereo mix on near field monitors positioned perfectly, vs reference headphones, vs a car, vs a club system... the pan/balance control you set in your DAW changes the localization of the sound depending on ultimate render target, whereas what you need for spatial audio is for the localization to be independent from those targets.
The solution is object based audio, which is not perfect and a misnomer 90% of the time, but even that is because of how terribly DAWs handle spatialization. And it's not something you can really control with a plugin without hacking the mixbus configurations inside a DAW, and you can't propagate position down to other plugins.