Hacker News new | past | comments | ask | show | jobs | submit login

I just gave an invited talk at KDD about deep learning in which I covered this algorithm, so it's great to see this code appear now.

For anyone interested in text analysis: PLEASE study and use this code and the referenced papers. It's importance is hard to overstate. It is far, far better than all previous approaches to word analysis. These representations are the dimensional compression that occurs in the middle of a deep neural net. The resulting vectors encode rich information about the semantics and usage pattern of each word in a very concise way.

We have barely scratched the surface of the applications of these distributed representations. This is a great time to get started in this field - previous techniques are almost totally obsoleted by this so everyone is starting from the same point.




I have previously used Explicit Semantic Analysis (ESA) algorithm for individual word similarity calculations. ESA uses as a basis the text of Wikipedia entries and its ontology as a source and worked quite OK.

Do you / does anyone know if there is an easy way to use word2vec to compare similarities of two different documents (think of TF-IDF & cosine similarity)? It is stated on the page that "The linearity of the vector operations seems to weakly hold also for the addition of several vectors, so it is possible to add several word or phrase vectors to form representation of short sentences [2]", but the referenced paper has not yet been published.

It would be super interesting if there was a simple way to compare the similarities of two documents using something like this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: