Related question - what is a very fast and easy to use library for scraping stat...

zamadatix · on Sept 13, 2021

Google search isn't a static site, the results are dynamically generated based on what it knows about you (location, browser language, recent searches from IP, recent searches from account, and so on with all of the things they know from trying to sell ad slots to that device).

That being said there isn't anything wrong with using Scrapy for this. If you're more familiar with web browsers than Python something like https://github.com/puppeteer/puppeteer can also be turned into a quick way to scrape a site by giving you a headless browser controlled by whatever you script in nodejs.

yewenjie · on Sept 13, 2021

I see. I am familiar with Python but I don't need something so heavy like Scrapy. Ideally I am looking for something that is very lightweight + fast and can just parse the DOM using CSS selectors.

paulcole · on Sept 13, 2021

I've had excellent luck with SerpAPI. It's $50 a month for 5,000 searches which has been plenty for my needs at a small SEO/marketing agency.

http://serpapi.com

_boffin_ · on Sept 14, 2021

As others have said, google isn't a static site, and in addition to that, they create a nightmare of tags and whatnot that make it utterly horrific to scrape.

After scraping tens of millions of pages, possibly hundreds of millions, i've fallen back to LXML w/ Python. It's not for all use-cases, but it works for me.

One thing I'll attempt to do before scraping the page is look to see if the page is rendered server side or client side. If it's client side, I'll see if I can just get the raw data and if that's the case, it makes it much much easier.