Hacker News new | past | comments | ask | show | jobs | submit login

On the contrary, I have found lxml suitable for all of my scraping projects where the objective is to write some XPath to parse or extract some data from some element.



LXML itself is, the problem is that its HTML parser (libxml's really) is an ad-hoc "HTML4" parser which means the tree it builds routinely diverges from a proper HTML5 tree as you'd find in e.g. your browser's developer tools and the way it fixes (or whether it fixes it at all) markup is completely ad-hoc and hard to predict.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: