Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why don't they convert the PDF to image and convert back? This approach seems to be a lot more efficient, and less prone to other type of human errors (e.g. missing page). Is there still an attack vector?


It's a bit like point and speak checklists on aircraft - it takes a certain amount of energy to do so you can't skip it without doing it deliberately


The Japanese train system utilizes similar concepts IIRC. When I first read about this I was astonished about how effective it was [0](up to 85% error reductions)!

[0] https://www.atlasobscura.com/articles/pointing-and-calling-j...

[1] https://news.ycombinator.com/item?id=18952193


Toronto and New York City use a similar point-only system on its subway systems. Without the white gloves though.

https://www.theglobeandmail.com/canada/toronto/article-autom...


If you do that, look at the document, hit CTRL+Z, then look at the document again, it will likely look identical, thanks to the fact that rendering a PDF to a JPEG with 70-90% quality... at ~600DPI... then scaling it back out to a 75-150DPI screen... is going to look visually lossless.

So, not only do you have the energy-investment thing noted in the/a sibling comment, you have the issue that there's no giant "THIS IS AN IMAGE" or "THIS HAS TEXT IN IT" that you can just Look At and know that yeah the document is okay. There's no lowest-common-denominator provability thing. You have to hyperspecifically know what to look for (render to image) then know how to verify whether it's an image or not.

And... how do you verify if it's an image? I don't have any PDF authoring/editing software on this machine, so the only thing I can think of is checking the Undo menu for "convert to image" or similar.


There will be no CTRL + Z, as it can only be used to save to a new document (just like scanning).

Under the hood, you created a new document, rasterize the original document page by page as JPEG, and insert the JPEGs back to the new document.

You can even create a fake "printer", that outputs a PDF with rasterized images as pages, so you don't have to teach the office clerks to anything extra.

To me, it seems to be indistinguishable from printing and scanning.

PS: It's pretty easy to verify if the page contains nothing but an image, programmically, especially if you also wrote the software that rasterize it in the first place.


> It's pretty easy to verify if the page contains nothing but an image, programmically, especially if you also wrote the software that rasterize it in the first place.

It's pretty easy for a computer to verify any of this, the point is making it idiot proof. You don't have to be much of an idiot, if you process hundreds of documents a year where there's no way to visually verify the difference between a badly redacted document and a well redacted document, to screw up once. Especially when the difference between them is that you remembered to push the "redact correctly button", and if you forgot that, remembered to push the "verify if is redacted correctly programmatically" button before hitting send.

What you do is create a ritual where you have to walk across the room and use a physical machine. You'll remember doing that. And if you don't, since the output will look a bit crap, you can confirm it trivially.

Creating a process that has to be done perfectly every time or it fails catastrophically, and has few indications of failure during the process, is worse than having no process at all.


It is probably still easier to screw up on a computer than by looking at physical documents to verify them and then scanning them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: