Hacker News new | past | comments | ask | show | jobs | submit login

LXML itself is, the problem is that its HTML parser (libxml's really) is an ad-hoc "HTML4" parser which means the tree it builds routinely diverges from a proper HTML5 tree as you'd find in e.g. your browser's developer tools and the way it fixes (or whether it fixes it at all) markup is completely ad-hoc and hard to predict.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: