Hacker Newsnew | past | comments | ask | show | jobs | submit | cratermoon's commentslogin

There are PDF files and there are PDF files. Many (most?) PDFs I run into are generated from Microsoft Word or some other MS product with no structure at all. The majority of people use MS products don't understand or care about structure. The WYSIWYG imperative means lots of markup to describe font size, color, and decoration, to make every section heading look the same without ever designating the text as a section head. The same happens with paragraphs, page breaks, and column flow. The resulting document looks correct enough to the creator. Other people who have a different version of Word, different fonts, and a thousand other little differences, won't see it correctly. That leads our author to generate a PDF, probably with embedded fonts, to ensure uniform appearance across these thousand little exceptions.

The result is a document with the content mixed up so incomprehensibly with appearance controls as to be both unreadable and without any residue of the underlying intended structure of the document's sections, headers, figures, paragraphs, captions, footnotes, or anything.

And then there's PDF files which are nothing more than a series of images of pages of text. If you're lucky and the scans are clean a good OCR might be able to recover most of the content.

What I'm saying is, it doesn't matter the tool, if authors don't encode structure and formatting in semantically meaningful ways.


So what you are actually saying is that there is a market for a tool that will recreate the PDF with a structure based on how the original PDF looks?

The market has been needing a tool like that for 30 years. A PDF document of the type I describe is like a broken egg. Information is lost between the authoring and rendering, to the extent that it's not clear recreating the original is even possible.

A typesetter could recreate the document through looking at it, doing some font research, and playing with the kerning for a while. Saying it's not possible to recreate a typeset document that is readable is absurd, no matter how twisted and insane the actual postscript is.

have you priced an Uber lately?

More frequent launches with less ambitious progress per launch makes good sense, and follows the old-school approach used through Apollo to mitigate risk. Having a lunar lander test in earth orbit, for example, is roughly the same mission as Apollo 9, is a good call. Validating everything works together has been a sort of sore spot for the Artemis program.

And even the Apollo 10 mission which went 99.99% of the way from the Earth to the moon, just 15km from the surface (but couldn't have landed on the moon- LM structure was too heavy) was incredibly important step. The sort of thing that people today would want to skip, it doesn't seem flashy or necessary. Why take all the risk of going into lunar orbit and separating the modules (requiring the very first rendezvous not in in Earth orbit) but not actually land on the Moon? It was about getting all of the ground crew proved and worked out, and proving that the rendezvous would work and they could get home, so that the actual landing mission could focus their efforts on just working out the last 15km, confident that all of the other problems were already dealt with. Trying to do all of that in one mission would have been a gigantic mess- A11 crew felt a lack of training time as it was.

Orion doesn't seem operationally or financially capable of launching more than once a year. It's not that they don't want to do test flights, it's that they can barely do anything.

Which goes back to the Pork-on-a-stick requirement that everything be about keeping the workers still employed.

"The EARS format (Easy Approach to Requirements Syntax) turns natural language requirements into structured, testable statements."

Yes, that's called "programming", as Dijkstra explained https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...


If you gave away as much money proportionately, you'd have about 75 fewer dollars in your pocket.

Tell me again how generous billionaires are.


Remember the good ol' days of the last century when we worried about Big Government spying on us?

I didn't even have to look to know that they'd added a section on AI sloperators.

Want my feedback? Delete the AI bullshit and go back to teaching programmers how to learn and understand what they are building.


"The only way to prove that someone is old enough to use a site is to collect personal data about who they are."

This is not true, as others have pointed out. Kind of sad to see no mention of privacy-preserving technology already in use in an IEEE article.


I recall a time when github was having an outage at the same time me and a coworker were trying to fix a high priority issue. I had pushed my changes before the outage but he couldn't pull them. I proposed that I share my repo locally so he could pull from me, but he looked confused and didn't get it, so I let it drop.

Make them request it. Put a link to it on every page served from your site, in the footer or sidebar. Make the text or icon for the link invisible to humans by making the text color the same as the background and use the smallest point size you can reasonably support.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: