...and I have no plans to add NLP tools in gensim. The connection between gensim and tokenizing/tagging/parsing libs is intentionally loose and flexible.
I'm a fan of "do one thing, do it well".
Having said that, it would be great to facilitate "spaCy + gensim" pipelines for users.
For example, the "word vector representations" can be trained easily with gensim, on arbitrary user-specified corpora, whereas spaCy loads something pre-trained, in a specific format. Maybe room for some interoperability there?
I'm a fan of "do one thing, do it well".
Having said that, it would be great to facilitate "spaCy + gensim" pipelines for users.
For example, the "word vector representations" can be trained easily with gensim, on arbitrary user-specified corpora, whereas spaCy loads something pre-trained, in a specific format. Maybe room for some interoperability there?