Hacker News new | past | comments | ask | show | jobs | submit login

I used to semi-automate access to some sites by using Selenium with a non-headless browser. These were sites where there were just one or two pages where I wanted some automation to fill out a form or scrape some data, and they frequently made changes to the home page that made it hard to automate navigating from the home page to the pages I wanted to automate.

The idea was to have a script use Selenium to launch non-headless Chrome and then wait:

  driver = Chrome()
  driver.get("https://example.org")
  input("Press enter when ready")
I could then manually deal with logging in, answering any CAPTCHA that came up, and navigate to the page I wanted to run my automation. Then I could press "enter" in my terminal and my script would continue.

That used to work fine, but then on sites using Cloudflare's CAPTCHA it stopped working. Solving the CAPTCHA would just result in another CAPTCHA.

I tried an alternative Selenium Chrome driver that was supposed to be more stealthy, and tried setting various flags that were supposed to make it so JavaScript could not tell that Selenium was there, and those worked for a while, but then they stopped working.

The results were similar using Selenium with Firefox.

I also tried Puppeteer, with Chromium and Firefox, and they too could not get past the CAPTCHA loops.

I then tried Playwright. With Chromium and Webkit that got the CAPTCHA loops. With Firefox it actually worked. I didn't even see the CAPTCHA. The non-interactive check for not being a bot passed.

Still, the whole approach seems fragile. I don't know if Firefox/Playwright working was due to some fundamental difference between Firefox and the others or just Cloudflare having not yet gotten around to dealing with it.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: