Hacker News new | past | comments | ask | show | jobs | submit login

I don't know a whole lot about text analysis and the mentioned algorithms, can this be used to analyze articles and determine which are dealing with the same subject? Techmeme-ish? Or what would be a good starting point for this? (Or would this be better off in an 'Ask HN' post? I am one of those horrible new people on here.)

The: "tf-idf + cosine similarity + LSA metrics" bit from Pattern is what you are looking for.

In other words, the vector module: http://www.clips.ua.ac.be/pages/pattern-vector

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
