Great project. The biggest question for me when I'm using phantomjs is why phantomjs is trying to replicate nodejs infrastructure. For example, phantomjs has an HTTP server feature for processing incoming requests. This doesn't make sense to me because a browser shouldn't be a server. If you need to get information out of the worker, you should POST it somewhere. The proclivity of phantomjs users to prefer stdout is astounding. It's definitely the #1 question or issue that I get fielded in #phantomjs on freenode.
For example, for POSTing and reading from redis/resque I wrote this (proof of concept, not what's in production):
> There are similar "glues" like phantomjs-node that integrate phantomjs by
> spawning a process, and processing the stdout stream, but it is limited by
> what can be done via the command line of phantomjs. If you really want direct
> api access to the browser, the best way is via direct integration.
This seems like a lot of overhead on top of a phantomjs (or even just a generic webkit) worker. Substack's approach was to just put a proxy in front of a browser that injects a <script> tag into the page to boss the browser around:
Supposedly the actual browser client shouldn't matter, as long as your fleet of workers are up and running. I bet chimera's approach will end up with more access to npm modules in the long run compared to phantomjs.
People have been doing that for a surprisingly long time. I think it was encouraged by the fact that CoffeeScript is immediately intelligible to a lot of JavaScript coders, and authors reckon that anyone who has trouble can just compile it and get fairly idiomatic JavaScript.
I actually wrote a bookmarklet a while back that would look for CoffeeScript snippets on a page and translate them for people who find it troublesome, but didn't end up doing much with it because I didn't feel like there was much interest.
criticisms of coffeescript aside (e.g. interstitial whitespace sensitivity), the code can't directly be applied in node (as far as i can tell, you can't tell node to run a coffeescript file directly using the `node <script>` syntax -- you have to use a framework or compile to js).
If you don't want to compile, assuming you have CoffeeScript installed, you can just run it with coffee instead of node and everything else works like normal.
remember: java is to javascript as pain is to painting.
it's not an apt comparison. You can directly write javascript and run it in your browser or in node (or test it out interactively in the node REPL).
I like to play with modules in node interactively (when relevant) because its easier to see what's going on and much easier to iterate (esp. in conjunction with the .load REPL command)
Great work! This might even have potential for browser-based testing, since mocha-phantomjs runs from an executable; I'd prefer a code-based solution like Chimera integrate with Mocha.
If you want to parse the DOM for the internet at large, you need a real browser. There are simply too many sites with really bad HTML to be parsed reliably with anything else.
It's merely an integration library for the netsurf browser. It's the html parser for a real browser. I considered using the parser from other browsers like firefox or webkit, but netsurf had the fewest external dependencies.
phantomjs-node uses ridiculous hacks. Not being rude but I think the author was drunk when writing that. I ended up wasting quite some time when trying to customize it.
A README is the absolute bare minimum for a GitHub project. Even a lack of code can get some interest in an idea. Someone landing on the GitHub page will reflexively look for the back button if they haven't seen the blog post.
Has anyone actually gotten this working? I've tried installing on Mac OS X and Ubuntu, both with various problems. The precompiled binaries don't work, the qt build scripts fail, etc. etc.
Sorta. The joy of PhantomJS (and of Chimera) is that they use a real browser to run the JS/CSS/HTML/whatever. No simulated DOM; no simulated cookies; etc. Just a real [headless] WebKit browser with all of its quirks and tricks. You can even take screenshots of a real, rendered webpage (which is great for debugging).
For example, for POSTing and reading from redis/resque I wrote this (proof of concept, not what's in production):
https://gist.github.com/000037f472b72d9490a6
A few thoughts..
This seems like a lot of overhead on top of a phantomjs (or even just a generic webkit) worker. Substack's approach was to just put a proxy in front of a browser that injects a <script> tag into the page to boss the browser around:https://github.com/substack/schoolbus
Supposedly the actual browser client shouldn't matter, as long as your fleet of workers are up and running. I bet chimera's approach will end up with more access to npm modules in the long run compared to phantomjs.
Also, the link wasn't in the article: https://github.com/deanmao/node-chimera
For the python equivalent of this project, there's https://github.com/kanzure/pyphantomjs