PDFBox 1.8 less-than-great rendering engine forced us to include a separate library for that purpose only.
Moving to PDFBox 2.0 is also on our roadmap. But the text extraction API in 2.0 has changed a lot too, so porting our engine would require quite a bit of effort.
Friendly reminder: we're an MIT-licensed open source project, and we're always open to contributions!