Hacker News new | past | comments | ask | show | jobs | submit login

If you’re not perfectionist about layout precision, MS Word provides the same meta-data encoding function through the application of style (‘this is a heading’, ‘this is a definition’, etc), as one can get from structured document layout applications like FrameMaker, Ventura, InDesign, Quark.

If the PTO has provided an MS Word Style Template and a document schema (document template), it is dead easy to extract a useful XML encoding for further analysis. There is a lot you can ignore in an MS Word file. Dead easy to write XPaths and XQueries that provide an API for the original DOCX document collection.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: