Hacker News new | past | comments | ask | show | jobs | submit login

I don't know whether they have a specific French <--> Chinese model. They might, they might not.

It's hard to train for all n^2 language pairs, so MT systems usually back off to English as a pivot language. i.e., they'll translate French --> English --> Chinese.




New neural machine translation architectures are experimenting with pairs of neural encoders / decoders, one pair for each language and a shared language independent vector space for the meaning of all words:

http://arxiv.org/abs/1406.1078

http://arxiv.org/abs/1409.3215

http://arxiv.org/abs/1410.8206

So the total number of models is still linear with the number of languages.

I do not know whether this new generation of translation models is leveraged by the google translation app though.

Also pairs of languages for which their are big amount of parallel training data will still be favored.


> Also pairs of languages for which their are big amount of parallel training data will still be favored.

wouldn't bible translations help?


It might but:

- the vocabulary and topics covered in the bible is quite different from today's written and spoken text, especially phone discussions or social network messages.

- other aligned corpora such as http://www.statmt.org/europarl/ are much larger than the bible (several millions of tokens for most pairs vs less than 1 million for the Bible)

Agreed that http://www.statmt.org/europarl/ does not cover non-European languages.


> so MT systems usually back off to English as a pivot language

That's an interesting choice, because English lacks features some other languages might have, and thus you end up distorting through English. I remember considerable work from different sources a ways back toward a constructing artificial languages for this purpose so to mitigate the introduction of ambiguity by using an existing natural language as a pivot language, I'm surprised that natural language as the pivot is the state of the art (though I'm not surprised that English is the pivot language given that.)


It's just weight of research hours, and weight of data. Our English numbers are almost always better in NLP.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: