Primer on Neural Network Models for Natural Language Processing[pdf]

vonnik · on Oct 3, 2015

Yoav is one of the smartest guys in this space. He and Omer Levy also wrote a great explanation of word2vec:

http://arxiv.org/pdf/1402.3722v1.pdf https://levyomer.wordpress.com/2014/04/25/word2vec-explained...

Karlozkiller · on Oct 4, 2015

I found that explanation to focus almost entirely on the negative sampling, not explaining too much of the actual Skip-Gram and CBOW-models.

However as I've understood it, the negative-sampling is a big part in why those models are so calculation-efficient, combined with Hierarchical Softmax to reduce the complexity further.

gojomo · on Oct 4, 2015

This current article seems to cover the various choices for constructing 'contexts' (which include skip-gram and CBOW) pretty well.

Note that negative-sampling and hierarchical-softmax are actually alternative choices to interpret the hidden-layer and to arrive at error-values to back-propagate. Each can be used completely independently.

If you enable them both, you're training two independent hidden layers, which then in an interleaved fashion update the same shared input-vectors. (Essentially, it's joint training of each example via the hierarchical-softmax codepath to nudge the vectors, then via the separate negative-sampling codepath to nudge the vectors.) So the actual combination doesn't reduce the complexity – it's additive to model state size and training time – and I think most projects with large amounts of data just use one or the other (usually just negative-sampling).

Karlozkiller · on Oct 4, 2015

Ah, thank you for pointing that out. I guess I got confused in all the papers I've read on the topic recently. It's hard to get into.

However, I would still not agree that the comment-linked article explaining negative sampling really explains how word2vec works, well enough, or maybe I just didn't understand.

Either way I recommend looking at this article as well if anyone wants to understand word2vec. http://www-personal.umich.edu/~ronxin/pdf/w2vexp.pdf

mark_l_watson · on Oct 3, 2015

Nice paper. I especially like how he has equations, pseudo-code, and Python code snippets. He could turn this paper into a book, adding full Python examples, and I would buy a copy.

Karlozkiller · on Oct 3, 2015

Huh, I wrote a few pages on neural networks for Natural Language Processing just a few days ago. Too bad I didn't have access to this. It seems to mention all the different kinds of networks I figured to be relevant to mention, and it has a comprehensive explanation on Recursive Neural Networks, which I didn't really find.

Nice one.

stevetjoa · on Oct 3, 2015

I glanced through the entire PDF. While it looks like an outstanding comprehensive overview to neural networks, it doesn't appear to really address NLP all that much, despite the title.

I would gladly welcome if you or someone could write a guide that has the comprehensiveness of the PDF above but with more NLP domain-specific discussion and concrete examples.

herewego · on Oct 3, 2015

I think you should give it another look because, by my observation of the PDF, it's most definitely all about NN-based NLP.