Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I did something similar in Java, though it was for a company so I can't open source it. FYI, I noticed that the script changed drastically between (roughly) December and January. The new script works a lot more reliably in my experience. It now has a multi-pass algorithm that relaxes various criteria if it can't find anything with the strictest settings. It also looks for content DOM nodes and assigns points to parent and grandparent DOM nodes. It used to only assign points to the parent, which would give the wrong results in some cases.

In any case, I was just thinking that I would really like to get a python library that does the readability thing for a personal Google App Engine project I have in mind. If anyone knows of anything, I'd love to save some time. Otherwise, I'll probably start from Beautiful Soup and try porting readability on top.

Your Ruby code might also be useful if you end up open-sourcing it.



I second you on this , will love a python port of it for GAE.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: