If you’re not perfectionist about layout precision, MS Word provides the same meta-data encoding function through the application of style (‘this is a heading’, ‘this is a definition’, etc), as one can get from structured document layout applications like FrameMaker, Ventura, InDesign, Quark.
If the PTO has provided an MS Word Style Template and a document schema (document template), it is dead easy to extract a useful XML encoding for further analysis. There is a lot you can ignore in an MS Word file. Dead easy to write XPaths and XQueries that provide an API for the original DOCX document collection.
If the PTO has provided an MS Word Style Template and a document schema (document template), it is dead easy to extract a useful XML encoding for further analysis. There is a lot you can ignore in an MS Word file. Dead easy to write XPaths and XQueries that provide an API for the original DOCX document collection.