> HTML can easily be offline-able. Base64 your images or use SVG, put your CSS in the HTML page, remove all 2-way data interaction, basically reduce HTML to the same performance as PDF and allow it to be downloaded.
You're missing the point. Even a relatively computer-illiterate person can easily save a PDF to my hard drive, and it's significantly more difficult with HTML. At a minimum you're probably going to get an HTML file with a sidecar directory (or I believe a sometimes browser-specific archive, it's been a long time since I tried since it works so poorly), and even that may not have the content you want to due to dynamic sites.
As I explained, if the author wants to make HTML easily offlineable then inline CSS and Base64 images. Or, you know, make your website printable. If authors actually thought about the print to PDF "problem" it could be solved with traditional CSS and HTML. As someone else said, we used to do this. It used to be part of my every day web design job to make sure the page printed nicely.
The idea that the whole web is going to pander to edge case archivers is asinine. This whole conversation is about supporting the needs of the very, very few and romanticizing about the time when only interesting people used the internet. It's kind of elitist and self serving.
I guess I don’t really understand the point being made. Does it matter that much that saving a page create a single file in your hard drive? If you really want a static rendering of a site why not just print it to a PDF. Why does that have to dictate the file format you use for distribution? With PDFs you don’t have to worry about conversion but they are also comparatively larger over the wire.
> even that may not have the content you want to due to dynamic sites
But PDFs also don’t give you dynamic content. Nothing is stopping people from using HTML to serve static, JS-less content. In fact that’s what it was originally designed to do. All this web app stuff was bolted on afterwards, and it’s optional.
What do we accomplish by having some people switch over to PDFs? The people who don’t care about bloat will continue to not care about it. It’s not like thin content will become more discoverable or more common. It doesn’t really change incentives. The author says using PDFs makes it so you’re not tempted to add cruft to your sites but that’s not really a compelling argument.
Getting content creators to produce content without bloat is not really a technical problem. It’s a cultural and economic one. I don’t see how a file format addresses that.
> Does it matter that much that the artifact of saving a page be a single file in your hard drive?
Yes, it matters a lot. Word/Excel files are actually a zip archive containing many files and sub-directories. Can you imagine people working with exploded Word files, sending over mail and WhatsApp complete directory trees?
The file format restricts the possibilties. You know what to expect when you see a PDF - static, JS-less content. With HTML on the other hand, it depends on what the author decided.
Or I could just make sure that my page prints reasonably well (we used to do this) and use the print-to-pdf functionality available in modern browsers.
But if you want a page in PDF, you can print it to PDF. Sure, non-computer-savvy users might not know how to do it off-the-bat, but browsers make it pretty easy.
Oh, I know that. I just meant that if your goal is for the website to be easily archivable, rather than publishing the website as PDF you could use simple HTML which wouldn't suck when printed to PDF.
Say hello to your new sidecar directory (or broken CSS/images/God knows what else)!
I tried to save an NY Times article, and it 1) needed JS to display anything, 2) even with the sidecar stuff was broken, 3) it was so plastered with ads and other junk I thought it was incomplete (it wasn't, I just had to scroll waaay down past something that looked like a footer and some voids after that).
If you save a PDF, you get that exact PDF on your hard drive, and when you open it (even in 10 years) it will look exactly the same as it did on the site.
This is of course the point of the article - that the web is a giant steaming pile of shit for the most part, plagued by JS and external resource requirements, all of which contribute to massive total page size.
I'll preface by saying I have some expertise in HTML, but none in PDF (the format).
The point of most commenters who suggest that HTML is still a better alternative than PDF (I agree), are assuming that if this is an important issue to you, that you would craft your page in a simpler style compared to most of what we see on the web, making Print to PDF or Save As... more viable.
> PDFs and a PDF tool ecosystem exist today. No need for another ghost town GitHub repo with a promising README and v0.1 in progress.
This is news to me. I'm not sure that I buy it. PDFs have always been a pain in the ass to work with in my opinion. Maybe there are tools, but in my experience they aren't very good.
In general, we know that HTML is going to be much more compact (and compressible!) than PDF and that's the biggest advantage I see on a web where bandwidth still matters. Another downside shows itself by trying to copy and pasting the above quote: PDF formatting seems to be weird.
> In we know that HTML is going to be much more compact (and compressible!) than PDF and that's the biggest advantage I see on a web where bandwidth still matters.
PDFs can be tiny if they do not embed fonts. Serving fonts is very much a complex technology in HTML world.
Browsing the web is a pain in the ass if you don't use a browser compliant with up-to-date standards, but the whole "HTML can be lightweight" argument pretty much depends on avoiding much of today's standardisation. As an objection to the original argument, it is not comparing like with like.
> This is news to me. I'm not sure that I buy it. PDFs have always been a pain in the ass to work with in my opinion. Maybe there are tools, but in my experience they aren't very good.
> In general, we know that HTML is going to be much more compact (and compressible!) than PDF and that's the biggest advantage I see on a web where bandwidth still matters. Another downside shows itself by trying to copy and pasting the above quote: PDF formatting seems to be weird.
PDF is a display format. I once worked on a project parallel to a guy who was parsing PDF to extract text content. IIRC, Text in PDFs is stored in a way that works fine for printing/rendering but not so well for manipulation (e.g. it's a bunch of commands to render line Z at position X,Y with font W). Those commands don't have to be in reading order, nor do they have the semantic meaning you can get from markup like HTML (e.g. superscript can just be nothing more than a different line rendered with a smaller font).
IMHO, PDF is actually less optimal than HTML for what this guy is advocating, except that it's those precisely those limitations that have prevented PDF from becoming the mess than Web HTML has. Though, that's probably in large part because the bloaters have been too distracted by the easier-target that is HTML to bother.
I actually did this pretty recently, in an attempt to get some magazine articles onto my Kobo e-book reader since Pocket couldn’t fetch the paywalled ones (I do pay).
I figured I could just save the page, automate a few edits to get around dynamic stuff, and then use it as, you know, an HTML document.
Even with a nice friendly mostly-text literary magazine, after about five hours I gave up and just copy-pasted the rendered text.
HN is not a good site to illustrate the unpleasantnesses of navigating the modern web. As you'd hope for a hacker news site, it is very friendly to this sort of thing. Most sites aren't.
> You're missing the point. Even a relatively computer-illiterate person can easily save a PDF to my hard drive, and it's significantly more difficult with HTML. At a minimum you're probably going to get an HTML file with a sidecar directory (or I believe a sometimes browser-specific archive, it's been a long time since I tried since it works so poorly), and even that may not have the content you want to due to dynamic sites.
Ctrl+P -> Save as PDF
You don't need the page to be a PDF to save it as a PDF.
> HTML can easily be offline-able. Base64 your images or use SVG, put your CSS in the HTML page, remove all 2-way data interaction, basically reduce HTML to the same performance as PDF and allow it to be downloaded.
You're missing the point. Even a relatively computer-illiterate person can easily save a PDF to my hard drive, and it's significantly more difficult with HTML. At a minimum you're probably going to get an HTML file with a sidecar directory (or I believe a sometimes browser-specific archive, it's been a long time since I tried since it works so poorly), and even that may not have the content you want to due to dynamic sites.