Hacker News new | past | comments | ask | show | jobs | submit login

You need to find collocations such as "software engineer" and "The Big Apple" and replace them with "software_engineer" and "The_Big_Apple" in the training corpus, then run regular w2v or GloVe. You will get exactly what you want, and also slightly improved vectors for the rest of the vocabulary.



It's identifying the collocations so that you can do that replacement that remains an imperfect science.


I've heard good things about "Scalable Topical Phrase Mining from Text Corpora" [1], but it's been a while, so I don't know how close to the state of the art it is.

[1] https://arxiv.org/abs/1406.6312




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: