Yeah the ways AI models have learned to interact with the world are all hilariously skeuomorphic given their capabilities. A model that runs on silicon has to learn English and Python in order to communicate with other models that also run on silicon. And to perceive the world they have to rely on images rendered in the limited wavelengths visible to the human eye.
But I much prefer this approach over allowing models to develop their own hyper-optimized information exchange protocols that are are black box to humans, and I hope things stay this way forever.
The way we plan to handle authenticated sessions is through a secret management service with the ability to ping an endpoint to check if the session is still valid, and if not, run a separate automation that re-authenticates and updates the secret manager with the new token. In that case, it wouldn't need to be stateful, but I can certainly see a case for statefulness being useful as workflows get even more complex.
As for device telemetry, my experience has been that most companies don't rely too much on it. Any heuristic used to identify bots is likely to have a high false positive rate and include many legitimate users, who then complain about it. Captchas are much more common and effective, though if you've seen some of the newer puzzles that vendors like Arkose Labs offers, it's a tossup whether the median human intelligence can even solve it.
I mentioned this in another comment, but I know from experience that it's impossible to reliably differentiate bots from humans over a network. And since the right to automate browsers has survived repeated legal challenges, all vendors can do is make it incrementally harder to weed out the low sophistication actors.
This actually creates an evergreen problem that companies need to overcome, and our paid version will probably involve helping companies overcome these barriers.
Also I should clarify that we're explicitly not trying to build a playwright abstraction - we're trying to remain as unopinionated as possible about how developers code the bot, and just help with the network-level infrastructure they'll need to make it reliable and make it scale.
It's good feedback for us, we'll make that point more clear!
> but I know from experience that it's impossible to reliably differentiate bots from humans over a network
While this might be true in theory, it doesn't stop them from trying! And believe me, it's getting to a point where the WAF settings on some websites are even annoying the majority of the real users! Some of the issues I am hinting at however are fundemental issues you run into when automating the web using any mainstream browser that hasn't had some source code patches, I'm curious to see if a solution to that will be part of your service if you decide to tackle it.
Don't take this the wrong way, but this is the kind of unethical behavior that our industry should frown upon IMO. I view this kind of thing on the same level as DDoS-as-a-Service companies.
I wish your company the kind of success it deserves.
Why is it unethical when courts have repeatedly affirmed browser automation to be legal and permitted?
If anything, it's unethical for companies to dictate how their customers can access services they've already paid for. If I'm paying hundreds of thousands per year for software, shouldn't I be allowed to build automations over it? Instead, many enterprise products go to great lengths to restrict this kind of usage.
I led the team that dealt with DDoS and other network level attacks at Robinhood so I know how harmful they are. But I also got to see many developers using our services in creative ways that could have been a whole new product (example: https://github.com/sanko/Robinhood).
Instead we had to go after these people and shut them down because it wasn't aligned with the company's long term risk profile. It sucked.
That's why we're focused on authenticated agents for B2B use cases, not the kind of malicious bots you might be thinking of.
Depends on the use case. Lots of hospitals and banks use RPA to automate routine processes on their EHRs and systems of record, because these kinds of software typically don't have APIs available. Or if they do, they're very limited.
Playwright and other browser automation scripts are a much more powerful version of RPA but they do require some knowledge of code. But there are more and more developers every year and code just gets more powerful every year. So I think it's a good bet to make that browser automation in code will replace RPA altogether some day.
Thanks! Wasn't familiar with Browserless but took a quick look. It seems they're very focused on the scraping use case. We're more focused on the agent use case. One of our first customers turned us on to this - they wanted to build an RPA automation to push data to a cloud EHR. The problem was it ran as a single page application with no URL routing, and had an extremely complex API for their backend that was difficult to reverse engineer. So automating the browser was the best way to integrate.
If you're trying to build an agent for a long-running job like that, you run into different problems:
- Failures are magnified as a workflow has multiple upstream dependencies and most scraping jobs don't.
- You have to account for different auth schemes (Oauth, password, magic link, etc)
- You have to implement token refresh logic for when sessions expire, unless you want to manually login several times per day
We don't have most of these features yet, but it's where we plan to focus.
And finally, we've licensed Finic under Apache 2.0 whereas Browserless is only available under a commercial license.
Sounds like a prooblem that can be solved with a Playwright script with a bit of error checking in it.
I think this needs more elaboration on what the Finic wrapper is adding to stock Playwright that can't just be achieved through more effective use of stock Playwright.
I recently implemented something for a use case similar to what they described. To make something like that work robustly is actually quite a bit more effort than playwright script with a bit of error checking. I have not tried the product, but if it does what it claims on back of the box it would be quite valuable if for nothing more than the time savings of figuring it all out on your own.
Proxies are definitely on our roadmap, but for now it just supports stock Playwright.
Thanks for the feedback! I just updated the repo to make it more clear that it's Playwright based. Once my cofounder wakes up I'll see if he can re-record the video as well.
Yep. I used to be the guy responsible for bot detection at Robinhood so I can tell you firsthand it's impossible to reliably differentiate between humans and machines over a network. So either you accept being automated, or you overcorrect and block legitimate users.
I don't think the dead internet theory is true today, but I think it will be true soon. IMO that's actually a good thing, more agents representing us online = more time spent in the real world.
A close friend of mine was an associate of Mr Beast, even living in his house in Greenville for several months. He confirmed a lot of the negative press about him in the media, and was himself ultimately screwed over by Jimmy and has been trying to get recompense for years.
MrBeast has always been clear that his goal is to make the best videos in the world. Not to be the most nurturing place to work, or the most philanthropically minded. This document makes that clear. It shouldn't come as a surprise to anyone that in becoming the best in the world at youtube, he's had to become an extremely toxic individual.
There's been a lot of "chat with your x" projects and the value prop always eludes me.
To use the example in the repo, if I want to know what image encoders are supported, I would do a repository search for the "Encoder" keyword to find where they're defined. Then I'd be able to see all the encoders that are supported. That takes me about 10 seconds - why would I want to use a chatbot to do this instead?
But I much prefer this approach over allowing models to develop their own hyper-optimized information exchange protocols that are are black box to humans, and I hope things stay this way forever.