Hacker News new | past | comments | ask | show | jobs | submit login
PDF Subpage Navigation (nibblestew.blogspot.com)
68 points by ingve on June 24, 2023 | hide | past | favorite | 22 comments



I tried to implement this for pystitcher, where I wanted to create a separate ToC that would link to various sections across multiple PDFs that would get stitched together.

None of the pdf implementations in Python supported it in any form - it’s just impossible hard/stupid.


Even if you did get it working, there is a very good chance that only Adobe Reader/Acrobat would support it.

This severely limits valid use-cases because users have become acclimatized to relying on in-browser viewers embedded in a web app (e.g. that of Google Drive, OneDrive, Dropbox, HubSpot, etc.) or built into the browser itself. These almost always are highly minimalistic implementations.

For example, in a previous company, we wanted to provide our clients with a PDF guide which had in-document jump links (which, in the world of HTML, would be called "URI fragment links") and also a functional table-of-contents in the form of PDF "bookmarks."

But I ran into two major blockers:

First, Microsoft Word is almost the only non-Adobe product which is capable of producing such PDFs, and only when _saving_ as a PDF, not when _printing_ as a PDF (which reduces the document to little more than would be printed on a page).

Second, the only in-browser PDF viewer I could find which supports these features was, somewhat surprisingly, the one which is provided in WordPress. It supports both.

A similar story played out among mobile and desktop PDF viewers which weren't Adobe.

So for our beautiful, accessibility-compliant, featureful PDF, we had to provide very strict instructions on how to view it properly. When provided for viewing online, it _had_ to be hosted in a WordPress instance. And when offered for download, it _had_ to be accompanied by loud instructions explaining the situation and requiring users to install the free Adobe Reader.

You likely dodged a bullet by avoiding the use of any advanced PDF feature.


> Even if you did get it working, there is a very good chance that only Adobe Reader/Acrobat would support it.

I'm not sure about this. At least on Linux, where I just tested - I have in-document jump links working across Chromium, Firefox, Evince, Okular, and even KOReader. Tested on a random arxiv PDF[0]. Chromium and KOReader don't highlight links, but clicking them still works.

[0]: https://arxiv.org/pdf/2305.06424.pdf


I didn't test on Linux, since we only permitted BYO Linux machines exceptionally.

And the PDF feature support landscape may have changed in the last year, since it was Q1 of last year (2022), I believe, when I last checked the in-browser PDF viewers in Google Drive, OneDrive, HubSpot, and Chrome for macOS and Windows. I suppose it might have been longer since I did the actual comparative testing.


> we wanted to provide our clients with a PDF guide

How often do you think they printed this guide, and of those instances, how often do you think they cared if the layout/pagination/etc. was identical to seeing it electronically? If that number is sufficiently low, I'd argue that the guide should've just been HTML.


Users seldom or never printed it.

I was told that we needed it (a computer requirements document) in PDF for legal reasons. I pushed back and legal confirmed that they did, in fact, need it to be in PDF format.

I suspect it had something to do with accreditation, accessibility, and/or student aid policy compliance, since this was a school.

But legal may also have found the relative transience of HTML documents to be a liability if a student threatened suit based on frequently changing policies, because I updated the requirements document twice yearly. PDFs made versioning very straightforward from a legal perspective, because the filename and download link always "needed" to be different between versions, without requiring dev work.


> Microsoft Word is almost the only non-Adobe product which is capable of producing such PDFs

LaTeX works great for links, automatic table of contents, references, and the rest. Basically just have the line `\usepackage{hyperref}` and include a `\tableofcontents` and you're set.


I've used LaTeX in ebook production, but on this occasion, I needed a process which could be handed off to a much less technical person.


> On the other hand you could implement a full choose-your-own-adventure book as a single page PDF using only subpage navigation.

That's really neat to see...and makes this issue suddenly so frustrating.

This kind of feature would absolutely transform the PDF experience. Imagine if LO alone could support it, maybe with the addition of a few viewers. I'll bet a lot of cool simulations or narrative experiences could be developed. Even if they're already doable elsewhere.

Thanks to the author for sharing the post.


If this had become common 20 years ago I'm not sure PDF would be as appreciated as it is today. Immutable pages laid out in sequence is one of the great features of PDF. I'm not sure we'd enjoy the stateful opaque pdf state machines.


Well the kind of people who enjoy finding and programming weird machines might like it.

https://en.m.wikipedia.org/wiki/Weird_machine


Wow, it would be great to have software that supported this!

I’m an art teacher, and one of my favourite things to do with the kids is to create interactive digital comics. We use software like Krita to draw the pages, and then import them into Keynote to add hyperlinks between pages to turn them into choose-your-own-adventure style interactive fiction. And then we export to PDF…


For your use case you should definitively give a try to "tableau noir". An awesome open source software i discovered at the fosdem this year. The video of the talk is probably a better introduction than going to the tool directly: https://fosdem.org/2023/schedule/event/tableaunoir/

The tool: https://tableaunoir.github.io/


You might like pdfpc: https://pdfpc.github.io/


interesting.

one of the reason I think this does not get used is because PDF as a document format got firmly entrenched as the go to format for 1. printable documents, books, slides, or strictly static content whose layout on every page is firmly frozen and 2. fillable PDF forms where user interaction is strictly limited to blank predefined fields and maybe 3. comments / annotations / signatures on existing docs

PDF is typically not used for actually "presenting" content in the form of slides to an audience where the presenter has to control how the information is revealed. Partly because PDF readers including Adobe Reader never had a good UX to support this use case.

Also given the cost/prevalance disparity between editing tools for PPT and PDF ... a layman would not be able to edit or create a PDF presentation (with animations) anyway. The critical mass was never achieved to make this a thing.


epub?


what about it?


It has anchor links, unlike PDF.


I must be misunderstanding what anchor link means.

PDF definitely supports internal "links" like a Table of Contents where clicking on a chapter name in TOC takes you directly to that chapter.

What am I missing?


Could you please tell me how to do this? I've been trying for a long time to export HTML documents to PDF from the browser and could never get internal links to work.


Microsoft Word can do it, but you must _save_ the document as a PDF. I know this because I've done it dozens of times, as recently as a year ago.

Notably, you _can't_ use PDF printers to accomplish this. PDF printers reduce the document to only visual data, losing interactive features.

I know Word isn't part of your solution but maybe this distinction between saving the document as a PDF vs. running the document through a printer driver will be useful to bear in mind in your case.

But I also suspect that you might have succeeded in the past unknowingly. This is because many, _many_ PDF viewers do not support internal jump links. This is apparently too "advanced" a feature for most in-browser viewers, for example, including the built-ins.

If you aren't already doing it, always test whether features are working in Adobe Reader before checking with viewers which may have reduced featuresets.


Thanks a lot! I'm exporting HTML documents to PDF from the browser on a regular basis, and normal http links have never had any problem to work, while jump links have been impossible.

Processing everything through Word after export could maybe be an option. In my experience exporting from the browser have been the only option to get nice PDFs. Everything else I've tried have mangled the layout something terribly, and Open Source converters are all of low quality.

It could be that it is the PDF clients that refuses all jump links, but then the problem is unsolvable, since these PDFs have to work on normal consumer devices such as Android and iOS phones. Apart from the jump links, PDFs are a god-send, since they work well everywhere.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: