http://pdfbox.apache.org/userguide/text_extraction.html
[0] https://github.com/coolwanglu/pdf2htmlEX
http://pdfbox.apache.org/userguide/text_extraction.html