Hacker News new | past | comments | ask | show | jobs | submit login

The UD treebanks have made it very easy to offer lots POS and dependency parsing models under a CC-by-NC license. We'll be putting up more of these for download as spaCy 2 stabilises.

We're mostly worried about saying we "support" a language when we've just trained a tagger on a UD treebank, though. We like at least having the stop words and tokenizer exceptions filled in by a native speaker, so the usual flow has been that someone needs the functionality, and they make a pull request.

If you just need the UD model for say, Bulgarian, you can do:

    python -m spacy train xx /path/to/output_model /path/to/bulgarian-train.conllu /path/to/bulgarian-dev.conllu --no-entities
We don't have a spacy.bg.Bulgarian language class yet, so you can either add one, or use the multi-language class, which usually works OK.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: