There are PDF files and there are PDF files.
Many (most?) PDFs I run into are generated from Microsoft Word or some other MS product with no structure at all.
The majority of people use MS products don't understand or care about structure.
The WYSIWYG imperative means lots of markup to describe font size, color, and decoration,
to make every section heading look the same without ever designating the text as a section head.
The same happens with paragraphs, page breaks, and column flow.
The resulting document looks correct enough to the creator.
Other people who have a different version of Word,
different fonts,
and a thousand other little differences,
won't see it correctly.
That leads our author to generate a PDF, probably with embedded fonts,
to ensure uniform appearance across these thousand little exceptions.
The result is a document with the content mixed up so incomprehensibly with appearance controls as to be both unreadable
and without any residue of the underlying intended structure of the document's sections, headers, figures, paragraphs, captions, footnotes, or anything.
And then there's PDF files which are nothing more than a series of images of pages of text.
If you're lucky and the scans are clean a good OCR might be able to recover most of the content.
What I'm saying is,
it doesn't matter the tool,
if authors don't encode structure and formatting in semantically meaningful ways.
The market has been needing a tool like that for 30 years.
A PDF document of the type I describe is like a broken egg.
Information is lost between the authoring and rendering,
to the extent that it's not clear recreating the original is even possible.
A typesetter could recreate the document through looking at it, doing some font research, and playing with the kerning for a while. Saying it's not possible to recreate a typeset document that is readable is absurd, no matter how twisted and insane the actual postscript is.
More frequent launches with less ambitious progress per launch makes good sense,
and follows the old-school approach used through Apollo to mitigate risk.
Having a lunar lander test in earth orbit,
for example,
is roughly the same mission as Apollo 9, is a good call.
Validating everything works together has been a sort of sore spot for the Artemis program.
And even the Apollo 10 mission which went 99.99% of the way from the Earth to the moon, just 15km from the surface (but couldn't have landed on the moon- LM structure was too heavy) was incredibly important step. The sort of thing that people today would want to skip, it doesn't seem flashy or necessary. Why take all the risk of going into lunar orbit and separating the modules (requiring the very first rendezvous not in in Earth orbit) but not actually land on the Moon? It was about getting all of the ground crew proved and worked out, and proving that the rendezvous would work and they could get home, so that the actual landing mission could focus their efforts on just working out the last 15km, confident that all of the other problems were already dealt with. Trying to do all of that in one mission would have been a gigantic mess- A11 crew felt a lack of training time as it was.
Orion doesn't seem operationally or financially capable of launching more than once a year. It's not that they don't want to do test flights, it's that they can barely do anything.
I recall a time when github was having an outage at the same time me and a coworker were trying to fix a high priority issue.
I had pushed my changes before the outage but he couldn't pull them.
I proposed that I share my repo locally so he could pull from me,
but he looked confused
and didn't get it,
so I let it drop.
Make them request it.
Put a link to it on every page served from your site,
in the footer or sidebar.
Make the text or icon for the link invisible to humans by making the text color the same as the background and use the smallest point size you can reasonably support.
The result is a document with the content mixed up so incomprehensibly with appearance controls as to be both unreadable and without any residue of the underlying intended structure of the document's sections, headers, figures, paragraphs, captions, footnotes, or anything.
And then there's PDF files which are nothing more than a series of images of pages of text. If you're lucky and the scans are clean a good OCR might be able to recover most of the content.
What I'm saying is, it doesn't matter the tool, if authors don't encode structure and formatting in semantically meaningful ways.
reply