You'll find that if you run UMAP on a large corpus (the same size as your origin... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

Der_Einzige on July 16, 2019 | parent | context | favorite | on: King – Man + Woman = King?

You'll find that if you run UMAP on a large corpus (the same size as your original word embeddings), the ones it'll generate (especially if you feed it any labels as UMAP supports semi-supervised and supervised dimensionality reduction) should outperform those generated I'd even wager by modern transformers. if they don't, than they'll be like 2% worse for a lot of speed improvement on the currently single threaded implementation of UMAP

Oh and you can use UMAP to concat tons of vector models together and all other side data for super-loaded embeddings

visarga on July 16, 2019 | [–]

So, do you run UMAP on the PMI matrix or on precomputed word embeddings? Seems like UMAP requires dense vectors as input.

hadsed on July 16, 2019 | [–]

Outperform on what tasks?

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact