This is very cool, impressive work in 2 weeks!
Each action seems to have some delay after it, is there any reason for that? Is it because you are streaming the OpenAI response and performing the actions as they come? If not, I imagine streaming the query response and executing each action as they emit would speed it up quite a bit?
There are multiple sources of latency: (1) parsing, simplifying and templatizing the DOM, (2) sending it to OpenAI and waiting for the response, (3) actually performing the action (which includes some internal waiting depending on what the actual action is), and (4) a hacky 2-second sleep() after each action to make sure any site changes are reflected in the DOM before running again. :)
(1) is already very fast. There's a lot of room for improvement in both (3) and (4); we currently set the timeouts very conservatively to improve reliability, but there are a number of heuristics we can use to cut those timeouts substantially in the average case.
Unfortunately for (2) I'm not sure there's much we can do. We only ask OpenAI for one action at a time to give it a chance to observe the state of the page between actions and correct course as necessary. We experimented with letting it return multiple actions at once but that hurt reliability; we can perform more experiments in that direction but it probably won't be a priority in the short term.