gavino's comments

gavino · on Dec 19, 2018

Sure! The backend is actually pretty straight forward, it's a NextJS app deployed on Now with a few added endpoints to handle the incoming GraphQL queries.

Then for actually turning the query into a digestable output I used the GraphQL schema builder that handles accepts HTML nodes from the requested page and grabs the right variables.

gavino · on Dec 19, 2018

I remember seeing GDOM a while back when I first started this project, but forgot to write it down as a source of inspiration. I'm gonna add all of these as alternatives, because they're all great :D

syrusakbary · on Dec 19, 2018

So happy to read that :) (and so glad it's served as source of inspiration for your project, keep up the good work!)

gavino · on Dec 19, 2018

Troy and Abed scraping websites!! :D

_opc6 · on Dec 20, 2018

in the morning!

gavino · on Dec 19, 2018

Nah, I don't really plan on turning it into a company. I'd gladly accept any PR to swap out cheerio, I haven't touched that part in close to a year :D

gavino · on March 26, 2018

That link doesn't work for me, but this one does https://www.textile.photos/ :)

andrewxhill · on March 26, 2018

thanks! updated in comment too

gavino · on June 1, 2016

Location: Reno, NV

Remote: Yes (tentatively)

Willing to relocate: Yes

Technologies: javascript, react.js, angular, node.js, rails, html, css (scss, less), react native

Résumé/CV: On request

Email: gavin@gavin.codes

Portfolio: http://www.gavin.codes

Github: http://www.github.com/gavindinubilo

gavino · on April 28, 2016

It doesn't currently do that, I think it'd be an interesting challenge to try and do that though. It's definitely possible to do.

daw___ · on April 28, 2016

Yep. Have a look at phantomjs [1], or other phantomjs wrappers like casperjs [2].

[1] https://www.npmjs.com/package/phantomjs

[2] https://www.npmjs.com/package/casperjs

etatoby · on April 28, 2016

> interesting challenge

Understatement of the year.

You'd need to either re-implement an entire browser stack or run a headless version of gecko of webkit server-side.

The former entails millions of man-hours of work. The latter opens up your server to all sorts of exploits. Overall a really bad idea.

Besides, single page applications are the worst junk in the entire Web 2.0 cesspool. If you really need to scrape them, they usually come with their own JSON API which you can just piggyback.

OJFord · on April 28, 2016

> entails millions of man-hours of work

Overstatement of the year.

Why on Earth would the OP start from scratch? Besides, though not a solo and OSS effort, Apifier does this; certainly without "millions" of hours having been spent on it.

gavino · on April 28, 2016

I had been trying to figure out what would be causing this issue, thanks for pointing it out, I've pushed a fix real quick that will respond whether JSON is invalid or a CSS selector wasn't found on the provided URL.

gavino · on April 28, 2016

Ah, yep, you're right, forgot to change the URL. Updated now. Thanks for letting me know.

jstanley · on April 28, 2016

And to get the HN post titles:

curl -d url=https://news.ycombinator.com/ -d json_data='{"title":[{"elem":".title > a","value":"text"}]}' http://www.jamapi.xyz/

This is cool :)

EDIT:

Incidentally, you don't really need to have that "index" key inside the values of an array, because in an array the order is preserved anyway. Unless I've misunderstood what it means?

pmontra · on April 28, 2016

Titles and links grouped together:

curl -X POST http://www.jamapi.xyz/ -d url=http://news.ycombinator.com -d json_data='{"title": "title","paragraphs": [{ "elem": "td.title a", "value": "text", "location": "href"}]}'

Use the http URL to call www.jamapi.xyz because calling https I get an Error code: SSL_ERROR_BAD_CERT_DOMAIN

gavino · on April 28, 2016

Regarding the "index" key, there are some JSON parsers for languages like Swift that will rearrange your JSON. By adding the index key, you'll still be able to sort after parsing.

Also, thanks, it's really cool to see people liking this :)

JelteF · on April 28, 2016

They might rearrange keys in a JSON object, but in an array they should be preserved in order as according to the spec[1]. If Swift does this (which I can't really check) than this would be a bug.

[1] http://www.json.org/: An array is an ordered collection of values. An array begins with [ (left bracket) and ends with ] (right bracket). Values are separated by , (comma).

chriswarbo · on April 28, 2016

Yes, the order of elements in an array should always be preserved. For example, we might be expecting the first element to be a name, the second to be a date of birth, etc. We should use an object for that, but that's for reasons of readability, extensibility, etc. rather than array semantics being unsuitable.

Also, jq has a `--sort-keys` option which tries to make the output as reproducible/canonical as possible. From the manual:

> The keys are sorted "alphabetically", by unicode codepoint order. This is not an order that makes particular sense in any particular language, but you can count on it being the same for any two objects with the same set of keys, regardless of locale settings.

It would be strange for a JSON tool to go to such lengths to normalise data, if array order were unpredictable.