The UD treebanks have made it very easy to offer lots POS and dependency parsing models under a CC-by-NC license. We'll be putting up more of these for download as spaCy 2 stabilises.
We're mostly worried about saying we "support" a language when we've just trained a tagger on a UD treebank, though. We like at least having the stop words and tokenizer exceptions filled in by a native speaker, so the usual flow has been that someone needs the functionality, and they make a pull request.
If you just need the UD model for say, Bulgarian, you can do:
python -m spacy train xx /path/to/output_model /path/to/bulgarian-train.conllu /path/to/bulgarian-dev.conllu --no-entities
We don't have a spacy.bg.Bulgarian language class yet, so you can either add one, or use the multi-language class, which usually works OK.
We're mostly worried about saying we "support" a language when we've just trained a tagger on a UD treebank, though. We like at least having the stop words and tokenizer exceptions filled in by a native speaker, so the usual flow has been that someone needs the functionality, and they make a pull request.
If you just need the UD model for say, Bulgarian, you can do:
We don't have a spacy.bg.Bulgarian language class yet, so you can either add one, or use the multi-language class, which usually works OK.