Hacker News new | past | comments | ask | show | jobs | submit login

Love the tool, wish I had this over a lot of other scrapers on multiple projects.

Is a chrome extensions in the works at all?




Chrome extensions run in a severely restricted environment. While this is arguably good for security, it prevents us from building some of the powerful tools we can build in Firefox. We do plan to eventually release as a standalone app with no browser dependency.


can you be more specific please? as far as i know, you can create chrome extension what executes custom javascript code on the current page and pass the results to extension. so what exactly is the problem?


Sure, I'll give you one example.

We want to show a sample immediately as a user changes what they extract. On a static website, this is fairly easy. You simply run what the user created on the currently visible page.

However, when you involve interactivity, you can no longer do that. The major problem is idempotent operations. Imagine a click that changed the dom of a page. And now imagine running the sample on that same page. Re-running the sample may no longer work, because the click could have changed the page in such a way that the extraction no longer works (e.g. it deletes an element from the page).

To solve this issue, we actually reset a "hidden tab" to the starting state of the page you're on. This happens every time you re-run a sample. Unfortunately, it's not possible with Chrome to create such hidden tabs. We also mess with the cache to make sure that this tab can be reset really quickly, something that we couldn't find an API for with chrome.

Hope that answers your question.


Did you have a look at https://developer.chrome.com/extensions/background_pages ?

Not sure if it does fit all your needs.


Could you clone the DOM into a virtual HTML element and save/serialize that?


That doesn't work because you need the javascript as well.


i see. thanks for taking time to explain.


https://scrape.it requires chrome extension. We have had no problems or found it "restrictive" in anyway.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: