Perhaps this is the wrong place to ask, but it seems relevant. Having skimmed th...

llanowarelves · on Aug 30, 2022

Have you tried Sikuli? It uses OpenCV in the way you want.

stavros · on Aug 30, 2022

How fast is it? I tried to do something similar myself but it pegged a CPU core for 5 fps.

phil294 · on Aug 30, 2022

> I can't see any options around automating selection/clicking of dialogue box buttons?

I can only give you some technical details for ahk_x11 here. It would be ControlClick I think, here [1] is the Windows docs for it. Not present on ahk_x11 right now, I plainly don't know if it is possible yet [2]. If it is, I definitely want to have it. Other than that, I can only think of clicking on fixed coordinates, such as `MouseClick, left, 200, 300`. I haven't done `CoordMode` yet which would be rather important for that, out of sheer prioritization. For a generic solution, there's also ImageSearch [3] for more recent Windows AHK versions which I'd like to have at some point too. Some OCR command would be cool as well.

This will all take a while though, if nobody else joins in.

Thanks for your viewpoints though, I agree with the sibling comment that these insights are valuable.

[1] https://www.autohotkey.com/docs/commands/ControlClick.htm [2] https://github.com/phil294/AHK_X11/issues/3 [3] https://www.autohotkey.com/docs/commands/ImageSearch.htm

chefandy · on Aug 30, 2022

Out of curiosity is there a reason you're not just biting the bullet and going with a more accessibility-focused proprietary OS? Seems like a big compromise for supporting FOSS, so I'm sure there's more to it. Always interested in hearing accessibility-involved use cases as an interface designer.

pomatic · on Aug 31, 2022

By "a more accessibility-focused proprietary OS", do you mean MS Windows? Or MacOS? Or both? So far, the strategy of controlling X-windows apps via piecemeal scripts seems to be paying off OK. The ideal would be full voice control of all UI interactions at an application level. Free form text entry in, say, an editor is easier to deal with. It is handled by injection of keyboard events from third party speech recognition tools that allows for correction prior to final confirmation if required.

If PowerShell or Applescript have significantly better capabilities in this arena (or if there is an alternative tool I should be looking at, or other resources that might be useful in the quest), it would be great to hear. At the end of the day though, the end user is dev, and wants a Linux desktop.

chefandy · on Aug 31, 2022

Pretty sure Mac has some similar stuff, but this is the Windows 10/11 built-in voice command functionality.

https://support.microsoft.com/en-us/windows/windows-speech-r...

I imagine someone could use their computer almost entirely through that save things that require precise mouse usage like photo editing or video games. Even then, you can achieve rough mouse usage through their on-screen grid, so you can do some things.

chefandy · on Aug 31, 2022

I know that Windows is generally the choice for blind users because of the well-worn accessibility framework access.

ttnj9ZKCVQAAbxK · on Aug 30, 2022

You could probably do it pretty easily for some dialogs, i.e. those drawn by some framework or other. But good luck doing it for all the other frameworks too. Your thought is probably the most effective...

bmn__ · on Aug 31, 2022

> this must be a solved problem

Yes, this is called a11y (accessibility). Install at-spi, run xwininfo, dogtail/sniff, accerciser, qdbusviewer to find dialogues and buttons, use any of the desktop automation tools to press them.

> buttons based on their label (OK/Cancel etc.)

That only works for English. Automate based on the type, not the label.

pomatic · on Aug 31, 2022

Thank you for the a11y tip, that's really helpful!

The point around internationalisation is well made. To be clear, I'm looking to implement a generic solution that potentially works for any application on the desktop. I do not have the source code for the underlying applications, so I'm not sure how/if I can discover the button type 'externally'? A config file per-application would be acceptable, though, and could address the language related issues, especially as they could then potentially be crowd-sourced.