Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well unfortunately I'm not doing much on my end at this point. I do a few small things and then let libots do most of the work. One of my last iteration items was to put some of my own summarization work into it and let libots be less of a player, but obviously with 48 hours, I had to prioritize.

I invested most of my cleverness in actually getting the content out of the page since that's really where the money is for an MVP for this; no content == no summary. :)



Yeah, that problem is a real pain. As I mentioned in my post it's the bit I'm not happy with. I wonder how the readability tool does it; that seems to do a very good job.

It seems that OTS uses a word frequency strategy, so the algorithm is similar or identical to the one I demoed. Interesting.


Their JS is out there if you grab it from the Bookmarklet. As in, it is not minified.

I have gone through it carefully, and it is clever.

OTS is definitely word freq based.


I'm using an algorithm very similar to what they do with a few clever additions of my own. I started out with something almost identical, but they had a few twists that made it even better, which I then in turn improved on (and HTML5-ified :)).


Why don't you open source your algorithm and more folks can work on it with you. I've been futzing with Readability JS converted to PHP (but could port to Ruby, Python) and it would be great to collab and share test files, etc.


Sure I might consider that at some point! Yet another OSS project for me to maintain though... :P


I'd be interested in working on this project --- it's a problem I've come across quite a bit. There's even an academic contest for it, called CLEANEVAL, although the way they set up the problem was arguably not quite right.


Let me know if you want some help, this is an area I'm interested in.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: