What the UI looks like has no effect on for example, Windows UI Automation libraries. How the tech works is that it queries the process directly for the sematic description of items, like here's a button called 'Delete', here's a list of items for TODO's, and you get the tree structure directly from the API.
I wouldn't be surprised if they are working off of screenshots, they still trained their models on having said screenshots annotated by said automation libraries, which told the AI what pixel is what.
I wouldn't be surprised if they are working off of screenshots, they still trained their models on having said screenshots annotated by said automation libraries, which told the AI what pixel is what.