Sure! The backend is actually pretty straight forward, it's a NextJS app deployed on Now with a few added endpoints to handle the incoming GraphQL queries.
Then for actually turning the query into a digestable output I used the GraphQL schema builder that handles accepts HTML nodes from the requested page and grabs the right variables.
I remember seeing GDOM a while back when I first started this project, but forgot to write it down as a source of inspiration. I'm gonna add all of these as alternatives, because they're all great :D
You'd need to either re-implement an entire browser stack or run a headless version of gecko of webkit server-side.
The former entails millions of man-hours of work. The latter opens up your server to all sorts of exploits. Overall a really bad idea.
Besides, single page applications are the worst junk in the entire Web 2.0 cesspool. If you really need to scrape them, they usually come with their own JSON API which you can just piggyback.
Why on Earth would the OP start from scratch? Besides, though not a solo and OSS effort, Apifier does this; certainly without "millions" of hours having been spent on it.
I had been trying to figure out what would be causing this issue, thanks for pointing it out, I've pushed a fix real quick that will respond whether JSON is invalid or a CSS selector wasn't found on the provided URL.
Incidentally, you don't really need to have that "index" key inside the values of an array, because in an array the order is preserved anyway. Unless I've misunderstood what it means?
Regarding the "index" key, there are some JSON parsers for languages like Swift that will rearrange your JSON. By adding the index key, you'll still be able to sort after parsing.
Also, thanks, it's really cool to see people liking this :)
They might rearrange keys in a JSON object, but in an array they should be preserved in order as according to the spec[1]. If Swift does this (which I can't really check) than this would be a bug.
[1] http://www.json.org/: An array is an ordered collection of values. An array begins with [ (left bracket) and ends with ] (right bracket). Values are separated by , (comma).
Yes, the order of elements in an array should always be preserved. For example, we might be expecting the first element to be a name, the second to be a date of birth, etc. We should use an object for that, but that's for reasons of readability, extensibility, etc. rather than array semantics being unsuitable.
Also, jq has a `--sort-keys` option which tries to make the output as reproducible/canonical as possible. From the manual:
> The keys are sorted "alphabetically", by unicode codepoint order. This is not an order that makes particular sense in any particular language, but you can count on it being the same for any two objects with the same set of keys, regardless of locale settings.
It would be strange for a JSON tool to go to such lengths to normalise data, if array order were unpredictable.
Then for actually turning the query into a digestable output I used the GraphQL schema builder that handles accepts HTML nodes from the requested page and grabs the right variables.