3D objects in PDFs are cool. My thesis used those in a few places. The PDF would...

thangngoc89 · on Jan 2, 2021

Do you know of any automated way for extracting 3D objects in PDFs? My main profession is a dentist and I worked with various 3D and CAD/CAM system. I have intra-oral scanner that would capture 3D-colour model inside your mouth. The sad thing is the entire system is a walled garden. It uses its own 3D format (.dxd) and would only offer STL as an export format, which doesn’t contain any colors information. I worked around this by first exporting to a 3D PDF file. Then I use Sumatra PDF [1] to MANUALLY extract the 3D model in u3d format. U3D is a very obsolete format that almost no 3D authoring program can read it. So I have to use (yet another) proprietary software [2] to convert it to a common 3D format like PLY or 3DS or even to WebGL [3].

[1]: https://www.sumatrapdfreader.org/free-pdf-reader.html [2]: http://www.finalmesh.com/ [3]: https://khoadabest.surge.sh

exikyut · on Jan 2, 2021

Besides the other ideas in this subthread, the first thing that springs to mind for me is scanning a bunch of random objects, converting the models to as many 3D formats as you reasonably can, and dumping everything on GitHub along with reference photos of the objects.

I'm personally idly curious, but have no experience with reverse engineering or 3D or file formats... so the emphasis on my end is idle curiosity :). But it's possible that many such people poking around may still generate interesting leads.

Depending on how effectively intraoral scanners can scan things other than teeth, offering to scan random objects people send/bring in, on a best-effort/no-warranty basis, may also generate practical interest.

(Also, wow, looks like these things are in the $25k range?)

thangngoc89 · on Jan 2, 2021

I think this is a pretty nice ideas. I will let you know once I've setup this. And just FYI, these expensive machines are actually at $50k. $25k range is for the scanner that has no color and requires you to coat the tooth with a layer of powder to prevent reflection from interfering with the scanner.

exikyut · on Jan 5, 2021

Wow, nice :) I can imagine color being incredibly helpful... and not needing powder certainly makes the process more user friendly and less intrusive.

Also, a very small extra thought, scanning extremely simple objects like cubes and flat planes may make the analysis process slightly easier because the data in the file will be easier to pars--wait. Okay I have more ideas.

Can you convert/import arbitrary 3D data into the proprietary dxd format? If there is any way to do this, there is nothing else that will move the analysis process as far forward as quickly, and offer the best chances of producing the most complete result. This is because a) the data files will have 99% less complexity due to being synthetic and not containing noise associated with real-world data, b) they'll be full of reference points from known 3d models, and c) entirely controllable input data gives the highest chance of figuring out all values/fields in the model files.

If this is possible, chances are most imports would be user requests based on the analysis process ("does changing this value alter this byte?"). Initial ideas I can think of would be the 3D Teapot, a single pixel :D, and simple cubes, triangles and planes.

Lastly, coordinating a backup installation of the scanning software onto a dedicated machine, or moving the main install onto such a machine, that enterprising reverse-engineers could connect to remotely (ideally at any time of day, and obviously after privately negotiating credentials) and install debugging tools (read: IDA/Ghidra/etc) onto, would likely be extremely helpful, and should provide the best "how we reversed this" narrative with regards to licensing. This would simplify the import request situation too.

If importing is not possible, IDA et al may end up being necessary to understand certain complex details or possibly even get started. Solving the "generate interest" problem would naturally be more complex in this scenario though. :/

I think I've really exhausted my knowledge in this area now :), although I do remain interested in knowing how things go.

thangngoc89 · on Jan 5, 2021

Hey, thank you for replying to this old thread. I got sometimes to scan some fake models to eliminate any legal reasons for publishing real patients data on the Internet.

I published all data in this Github repo: https://github.com/thangngoc89/dxd-file-format

I also tried to scan something simple like an sphere or a pencil without any success. The software only recognize tooth-like structures.

Luciky, it can exports to STL files with 100% triangles that can be imported to others dental CAD software so I hope it would help with the progress.

exikyut · on Jan 5, 2021

Wow, the colorization the software provides is seriously impressive.

It's regrettable but understandable that the software only recognizes/accepts teeth considering the postprocessing it clearly does.

And CC0ing the model data is pretty much the textbook approach to analytical freedom :)

(And just to confirm, STL/etc->DXD isn't possible?)

thangngoc89 · on Jan 6, 2021

Yes it is possible to go from STL to DXD. But last time I try that feature, the software crashed. I will try to do it again when I’m back at the office.

Thank you for reminding me about this.

Quick update: opening the DXD file with a hex editor, there is a XML file defining the metadata of the current file and a public RSA 1024 key. I’ve been scouring around to find the private key with no success.

exikyut · on Jan 8, 2021

Just saw this, sorry for the delay.

Hmmmm. Ideally that key is only being used for attestation/authentication, not encryption. In this case, you definitely don't want to locate the private key, because that key's confidentiality is what verifies the integrity of the scans made by your device.

Also, said private key might be specific to your copy of the software to create a chain of custody to your machine for medical purposes, or even more likely for licensing reasons.

In any case, if it's being used for encryption, that would amount to an unfortunate DRM situation that might be a bit of a hornet's nest to fiddle with, because of the high likeliness the key is being used for license enforcement etc (tracking scans made by copies of software deemed illegitimate etc).

It's very cool you can go from STL to DXD though. Now I'm curious, was the STL file that crashed the software originally generated from a DXD file created by the software? It originally being a DXD should be irrelevant, but chances are the pipeline inside the software chokes on things that aren't models of teeth. This does admittedly make the reverse engineering process trickier...

petters · on Jan 2, 2021

I don't, sorry. From what you wrote, you definitely seem more knowledgeable than me in this area.

thangngoc89 · on Jan 2, 2021

Thank you for your input. I forgot the mention in the original post that there is a tool called pdf-parser.py [1] which claims to be able to do that but it produces a broken output. I don’t know anything about Python or PDF internals to hack on it. Posting it here and hoping that the HN crowds could point me in the correct direction.

[1]: https://blog.didierstevens.com/programs/pdf-tools/

solresol · on Jan 2, 2021

I'd like to talk to you about this, but you don't have any contact details in your profile. You can find me email address in my profile.

thangngoc89 · on Jan 2, 2021

Thank you very much. I updated my profile with an email address. Nevertheless, I emailed you via the contact details

aidos · on Jan 2, 2021

Top tip: install mutool and run

mutool clean -d your.pdf clean.pdf

Now open clean.pdf with a text editor.

thangngoc89 · on Jan 2, 2021

That's really a top tip! Thank you very much. It looks like the original file is compressed using FlatDecode. Passing through mutool decompresses all streams and let's the parser does its job.

Thank you!

aidos · on Jan 2, 2021

Great! Glad it worked. Happy to help you unpick things a bit further. When you look inside the pdf file you’ll see that it’s actually a “tree” of “things”. Each one starts with “obj 0 1234” (or something like that). And they reference each other to build the structure. So for example, the document is made of a list of pages. So that’s one object. And each page is another object. And then each page is made of a bunch more stuff and so on. Somewhere in there, no doubt, you’ll find an object that’s your model.

daeken · on Jan 2, 2021

Any chance you could get me some dxd files? I'd love to take a stab at reverse-engineering this and writing a direct converter to something standard. Feel free to email them to me (email in bio).

thangngoc89 · on Jan 2, 2021

Absolutely yes! I don't have any files that doesn't contains sensitive patient informations on my laptop at the moment but I will create new files when I'm back at work on Monday. I will email you when I have the files.

thehesiod · on Jan 2, 2021

I'd look at the 3D JS API via the JS bridge that I helped write: https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdf...

namdnay · on Jan 2, 2021

Interestingly enough one of my early jobs was pretty much the opposite, writing a u3d encoder from spec and then using a commercial c++ pdf manipulation library to inject them into pdf files.

I am sure there are libraires un python or JS nowadays, it’s just a question of parsing the tree to find the u3d node and dumping it out, very simple

thangngoc89 · on Jan 2, 2021

HN is really the only place that you can ask a question and received answers from someone who actually worked on the problem before. And you're correct that all I need to do is find u3d node and dumping it out. See my response in parent thread about using pdf-parser.py.

mkl · on Jan 2, 2021

Getting it to work in Latex is easy if you use Asymptote: https://asymptote.sourceforge.io/

Back in 2011 I used it to make a whole lot of figures for a multivariable calculus course; they're still in use.

abhgh · on Jan 2, 2021

I am considering animations for my thesis. When printing, a designated frame should be used, but inside a reader, the animation should work. I am writing my thesis in LaTeX too. Any pointers?

ktpsns · on Jan 2, 2021

You probably better invest the time in the preparation of a couple of beautiful Jupyter notebooks. That's where people expect interactivity and code to happen, not in PDFs. In my scientific community, virtually nobody uses Adobe Reader (people on Mac use Preview.App, people on Linux use poppler/xpdf/evince, browsers have their own internal readers).

Edit/Appendum: Crafting an interactive website (i.e. without the dependency on jupyter) might be more future proof.

thangngoc89 · on Jan 2, 2021

Idyll (https://idyll-lang.org/) is a very promising tool in this field.

ktpsns · on Jan 6, 2021

Thanks for the link! I think this is tightly connected to Donald Knuth's Literate programming paradigm, i.e. there are also platforms which generate beautiful reports out of the embedded comments in your traditional source codes. However, you won't get the interactivity for free.

I personally prefer Jupyter because it seperates the programming language (Python, R, C++, etc.) from the representation (for instance Web) and still allows interactivity for a certain degree (given a backend running the source code).

abhgh · on Jan 2, 2021

This is an interesting project, thank you!

BlueTemplar · on Jan 2, 2021

He didn't say anything about interactivity though. But even a lower bar than this : just animation, is not currently cleared by the available document formats.

(And a website doesn't fit the requirements as it's not contained in a single file, so its archival is a lot more complicated.)

abhgh · on Jan 2, 2021

Unfortunately my school expects a PDF thesis. But you make a good point about popular alternate readers not supporting animations - maybe this is a wasted effort. Thank you! Probably better to link to notebooks or videos of the animations on vimeo/youtube.

ktpsns · on Jan 6, 2021

Yes, PDF is the standard. Some people get it managed to generate HTML out of their PDFLatex code. This could be a nice starting point for an enriched reading experience in the web browser. However, with standard print-first latex, I never could generate HTML for any nontrivial documents (especially large documents with many hundred pages).

Putting your animations in traditional video formats (mp4, ogg) or on vimeo/youtube is probably the best way to make them accessible for most people. Many scientific labs have their own youtube channels.

thangngoc89 · on Jan 2, 2021

Many 3D authoring program like Blender, Meshmixer can output U3D or RPC that you can use to embed into 3D PDF files. There are just many tools that can read the format. But beware that only Adobe Reader can show the 3D object

abhgh · on Jan 2, 2021

Thank you! Yes, I wasn't thinking about the read-time support.

mkl · on Jan 2, 2021

Note that PDF 3D models have a static image (a bitmap) which readers that don't support 3D (most of them) will show instead. Actually Adobe Reader shows the static image too, until you click on it to activate the 3D rendered version.

BlueTemplar · on Jan 2, 2021

I had a similar issue recently (for a much smaller project though). The sad reality is that it looks like that we currently don't have an actual, working, properly supported standard for electronic documents, which would include something as (relatively) basic as animation support : https://news.ycombinator.com/item?id=25612066