Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do tools like this cope with pages that are rendered by Javascript. What do the tools actually save? For instance if I save a Quora page using Firefox I can open it but if Quora is not accessible it doesn't work.


ArchiveBox is a wrapper around ~12 different extractor modules, each of which saves the page or its assets in a different way. The most relevant to JS is Singlefile, which renders the page in headless chrome and then snapshots the DOM with all assets inlined after a few seconds of JS execution. It's not perfect, but it works well even for the majority of JS-heavy sites.

For the very complex sites that really rely on a ton of interactive JS or dynamic requests to APIs to render their content, check out https://ArchiveWeb.page + https://ReplayWeb.page by https://webrecorder.io.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: