Hacker News new | past | comments | ask | show | jobs | submit login

You'll find that if you run UMAP on a large corpus (the same size as your original word embeddings), the ones it'll generate (especially if you feed it any labels as UMAP supports semi-supervised and supervised dimensionality reduction) should outperform those generated I'd even wager by modern transformers. if they don't, than they'll be like 2% worse for a lot of speed improvement on the currently single threaded implementation of UMAP

Oh and you can use UMAP to concat tons of vector models together and all other side data for super-loaded embeddings




So, do you run UMAP on the PMI matrix or on precomputed word embeddings? Seems like UMAP requires dense vectors as input.


Outperform on what tasks?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: