How do tools like this cope with pages that are rendered by Javascript. What do ...

nikisweeting · on April 20, 2021

ArchiveBox is a wrapper around ~12 different extractor modules, each of which saves the page or its assets in a different way. The most relevant to JS is Singlefile, which renders the page in headless chrome and then snapshots the DOM with all assets inlined after a few seconds of JS execution. It's not perfect, but it works well even for the majority of JS-heavy sites.

For the very complex sites that really rely on a ton of interactive JS or dynamic requests to APIs to render their content, check out https://ArchiveWeb.page + https://ReplayWeb.page by https://webrecorder.io.