This is a fundamental result of any natural language system. English is explicit...

This is a fundamental result of any natural language system. English is explicitly ambiguous, and explicitly a 1 dimensional medium.

Meanwhile using a screen, you can surface numerous easily discoverable options that a human being can quickly consume and choose, and the structure of the options can convey its own information, and you have lots of other signal channels like color.

I feel like people who insist that voice controlled computing is the future have never REALLY tried to voice control their computer with a voice, which has been eminently possible since at least Windows XP, and indeed is a way that disabled users have been interacting with computers for decades.