*> If not, they tend to be trivial to scrape.* Not if the "web page" is just a s...

capableweb · on Jan 14, 2022

Not sure where you have been for the last 10 years but yes, even SPAs are trivial to scrape today. But even better, because many people build SPAs, they tend to be powered by APIs, so you can just use the API directly instead. But even if you can't, trivial to scrape even when flooded with JS magic.

pdonis · on Jan 14, 2022

> even SPAs are trivial to scrape today

How? (I'm asking about the case where there is no API.)

fauigerzigerk · on Jan 14, 2022

With something like Puppeteer [1]. That said, if we're now headed towards a canvas + WebAssembly world, things could get far more difficult.

[1] https://github.com/puppeteer/puppeteer

pdonis · on Jan 14, 2022

Ah, ok. Yes, if you can remote control a browser you can of course "scrape" anything the browser can load. (Although even here the puppeteer README says it can load server side rendered data. Not all single page web apps do that.) To me that isn't quite the same as having a separate program, independent of the browser, that is able to just load the data from the URL and then operate on it, which is what I'm used to seeing referred to as "scraping".

fauigerzigerk · on Jan 14, 2022

I think what you are suggesting is often done using browser extensions. It's obviously not independent of the browser, but it serves this use case where you want to extend existing apps adding interactive features.

My understanding of the word "scraping" is primarily something related to en mass automated data extraction from user interfaces.