I am really interested in this. The current paradigm from e.g. Suno is an all or nothing finished product. Producing intermediate assets allows you to do simple things like proper mastering or swapping instruments or editing melodies etc.
I agree that what ai music needs to become an industry tool is the ability to create, access and remix parts, but I think tools like Suno have more of the right idea vs tools like this. In order to be able to write intermediate parts properly, you need to be able to understand the whole and what things should sound like when put together, or when the notes are actually played by a musician. Then it’s easier to work back from there, split your tracks apart into stems, transcribe your stems into MIDI, etc.
Suno et al are moving in this direction but I honestly think development will be somewhat stunted until we get a good open source model(s), and something like control-nets.