Hacker News new | past | comments | ask | show | jobs | submit login

Perhaps this is the wrong place to ask, but it seems relevant. Having skimmed the documentation I can't see any options around automating selection/clicking of dialogue box buttons?

I have a user who has limited dexterity, and the ability to have a macro that selects/clicks on (modal?) dialogue box buttons based on their label (OK/Cancel etc.) would be a life saver for them. (A pre-determined set of keystrokes based on knowledge of the tab order would not work - this needs to be generic, based on the button text for my application.)

As things stand, I'm looking at hacking a video capture device and openCV together, but I can't help but think this must be a solved problem and I just have poor google-fu?




Have you tried Sikuli? It uses OpenCV in the way you want.

http://sikulix.com/


How fast is it? I tried to do something similar myself but it pegged a CPU core for 5 fps.


> I can't see any options around automating selection/clicking of dialogue box buttons?

I can only give you some technical details for ahk_x11 here. It would be ControlClick I think, here [1] is the Windows docs for it. Not present on ahk_x11 right now, I plainly don't know if it is possible yet [2]. If it is, I definitely want to have it. Other than that, I can only think of clicking on fixed coordinates, such as `MouseClick, left, 200, 300`. I haven't done `CoordMode` yet which would be rather important for that, out of sheer prioritization. For a generic solution, there's also ImageSearch [3] for more recent Windows AHK versions which I'd like to have at some point too. Some OCR command would be cool as well.

This will all take a while though, if nobody else joins in.

Thanks for your viewpoints though, I agree with the sibling comment that these insights are valuable.

[1] https://www.autohotkey.com/docs/commands/ControlClick.htm [2] https://github.com/phil294/AHK_X11/issues/3 [3] https://www.autohotkey.com/docs/commands/ImageSearch.htm


Out of curiosity is there a reason you're not just biting the bullet and going with a more accessibility-focused proprietary OS? Seems like a big compromise for supporting FOSS, so I'm sure there's more to it. Always interested in hearing accessibility-involved use cases as an interface designer.


By "a more accessibility-focused proprietary OS", do you mean MS Windows? Or MacOS? Or both? So far, the strategy of controlling X-windows apps via piecemeal scripts seems to be paying off OK. The ideal would be full voice control of all UI interactions at an application level. Free form text entry in, say, an editor is easier to deal with. It is handled by injection of keyboard events from third party speech recognition tools that allows for correction prior to final confirmation if required.

If PowerShell or Applescript have significantly better capabilities in this arena (or if there is an alternative tool I should be looking at, or other resources that might be useful in the quest), it would be great to hear. At the end of the day though, the end user is dev, and wants a Linux desktop.


Pretty sure Mac has some similar stuff, but this is the Windows 10/11 built-in voice command functionality.

https://support.microsoft.com/en-us/windows/windows-speech-r...

I imagine someone could use their computer almost entirely through that save things that require precise mouse usage like photo editing or video games. Even then, you can achieve rough mouse usage through their on-screen grid, so you can do some things.


I know that Windows is generally the choice for blind users because of the well-worn accessibility framework access.


You could probably do it pretty easily for some dialogs, i.e. those drawn by some framework or other. But good luck doing it for all the other frameworks too. Your thought is probably the most effective...


> this must be a solved problem

Yes, this is called a11y (accessibility). Install at-spi, run xwininfo, dogtail/sniff, accerciser, qdbusviewer to find dialogues and buttons, use any of the desktop automation tools to press them.

> buttons based on their label (OK/Cancel etc.)

That only works for English. Automate based on the type, not the label.


Thank you for the a11y tip, that's really helpful!

The point around internationalisation is well made. To be clear, I'm looking to implement a generic solution that potentially works for any application on the desktop. I do not have the source code for the underlying applications, so I'm not sure how/if I can discover the button type 'externally'? A config file per-application would be acceptable, though, and could address the language related issues, especially as they could then potentially be crowd-sourced.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: