You need to find collocations such as "software engineer" and "The Big Apple" an... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

visarga on July 16, 2019 | parent | context | favorite | on: King – Man + Woman = King?

You need to find collocations such as "software engineer" and "The Big Apple" and replace them with "software_engineer" and "The_Big_Apple" in the training corpus, then run regular w2v or GloVe. You will get exactly what you want, and also slightly improved vectors for the rest of the vocabulary.

mr_crankypants on July 16, 2019 [–]

It's identifying the collocations so that you can do that replacement that remains an imperfect science.

yorwba on July 16, 2019 | [–]

I've heard good things about "Scalable Topical Phrase Mining from Text Corpora" [1], but it's been a while, so I don't know how close to the state of the art it is.

[1] https://arxiv.org/abs/1406.6312

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact