I think an accessibility a la vim or with something like tree sitter, would help...

FloatArtifact · on Nov 10, 2022

I recognize that all applications are not accessible through accessibility APIs. However, there is no high level access to accessibility APIs. There are quite a few for automated testing UI. However, none of them are performant enough for speech to code or screen readers. Testing automation frameworks don't really require high performance.

Accessibility accessing the content of the application and the context is what's important. It's more important than the speech recognition backend.

Speech recognition shines work best with a narrow context. (when those commands are available)

The type of performance we need as a speech recognition community and screen reader community is quite high. By the beginning of speech and just before decode time information needs to be available to be parsed for navigation/editing. That way these tokens can be weighted as commands for recognition.

Commands could be modeled after vim functionality though.

Outside of tree sitter it would be interesting to hook into hooking into as a client a language protocol server. However, I think they only expect one client. In addition, I still see that as a lesser approach without dedicated support for high performance UI automation server for speech recognition engine to leverage.

FloatArtifact · on Nov 10, 2022

Yes, minimizing number of command and specificity as much as possible for navigation by understanding the context of where the user is optimizes the user's time in navigation.

Imagine even more precise commands 'next function' followed by a letter. That allows you to navigate to only a function with that letter defined. Really the possibilities are endless when we have complete context of the screen and the structure of the code.

Someday I hope for the release of something like stable diffusion for voice coding. An open complete pipeline that users can illiterate fast and innovate!