Google Translate might not be as good as people thought

user24 · on Sept 23, 2010

Most SMT systems are trained using the procedings of the European Court - as it's a huge corpus of multilingual documents which all have the same meanings.

This is a large factor as to why non-european languages typically don't fare that well in Statistical Machine Translation systems. The corpuses (corpii?) aren't as large for other languages.

Subject-broadness is another problem. Early information retrieval systems were trained (and many still are) using the Wall Street Journal as the corpus. Which means they work great for searching on the topic of big business, but not so great on getting apple pie recipes, as WSJ doesn't talk much about that ;)

woodson · on Sept 24, 2010

There are some research projects (in rather early stages) that develop SMT techniques for translation to languages without big parallel corpora (essentially by bootstrapping such corpora, assisted by active learning). This could be of particular importance to keep smaller languages from disappearing, otherwise less and less works in that language will be available (yes, I'm aware that there are many people who consider language death a good thing).

user24 · on Sept 25, 2010

Sounds fascinating; links? research? Applicability to lost languages like Linear A?

crux · on Sept 23, 2010

'(corpii?)'

corpores, if you must.

rudd · on Sept 23, 2010

The plural of corpus is 'corpora': http://en.wikipedia.org/wiki/Text_corpus

user24 · on Sept 23, 2010

slaps forehead of course it is! Thanks :)

kylec · on Sept 23, 2010

    Please apologize for your stupidity.

I wish I had the gall to say this to people

techiferous · on Sept 24, 2010

I couldn't resist: http://pleaseapologizeforyourstupidity.com

thenduks · on Sept 24, 2010

There needs to be a place to actually apologize, toss a textarea in there and show previous apologies below :)

techiferous · on Sept 24, 2010

What an awesome idea. I'm adding that to my to-do list.

david_p · on Sept 23, 2010

So far, I'm guessing :

  lumber = log
  vomit = thow
  insult to father's stones = thrown in superclass
  wind, pole, and dragon = maybe "try,catch,finaly" or "if,then,else"
  goat-time = WTF ?

hasenj · on Sept 23, 2010

Wind in Japanese is 'kaze' (風), and the kanji character (according to rikaikun) has a second meaning: method.

http://translate.google.com/#en|ja|wind

Edit:

The kanji for dragon seems to have a reading: "ryuu" and that "sound" (if not written in kanji) has several meanings, one of which is also "method"

http://jisho.org/words?jap=ryuu&dict=edict

"goat" seems to be "yagi", and putting that in jisho.org, one of the results is related to "night shift", so maybe a nightly process? nightly build?

http://jisho.org/words?jap=yagi&dict=edict

donw · on Sept 24, 2010

I can't imagine anybody using '風' for 'method' for anything relating to software. It's usually written in either katakana (メソッド), or the mathematical term for function is used instead (関数).

Really wish I could see the original message...

fauigerzigerk · on Sept 24, 2010

I wish the author of that post could see a translation back into japanese of what he posted. I'd love to laugh together with him :-)

hasenj · on Sept 24, 2010

could it then be just a general "method", as in "the way to do a certain thing"?

donw · on Sept 24, 2010

Kind of, although I think that 'in the style of' is a more accurate translation. For example, you might see '日本風のステーキ' for 'Steak, in the Japanese style.'

lt · on Sept 23, 2010

runtime maybe.

sfphotoarts · on Sept 23, 2010

I have a Russian friend who used my computer to check her vkontakte (Russian facebook), she's not used to Chrome and it auto-translated the page. At first she didn't notice because she's equally capable in English, then she started to giggle at the word pancake (which is a poor translation), but she said that on the whole it did a really good job. I guess some languages are easier to do. Anecdotal I know, but its good enough for me to use Russian websites to read photo comments.

avar · on Sept 23, 2010

Another interesting thing about Google Translate is that people sometimes successfully troll it by amusing the "submit a better translation" feature, which Google ostensibly uses without much checking.

I'd point out an example, but Google's engineers probably read this site, and the examples I have are too valuable for my personal amusement to give up :)

rdela · on Sept 23, 2010

I love amusing the "submit a better translation" feature.

avar · on Sept 23, 2010

Thanks :) That was a fun grammar error.

njharman · on Sept 23, 2010

Did people think it was that good? IME, it translates normal text to and from many languages into something that provides the gist if lacking in the nuances. Which, IMHO, is flippin fantastically great even in the face of the odd horrible translation.

waterlesscloud · on Sept 23, 2010

I was recently typing in some phrases from a 19th century book in French (which I'd found on Google Books), and it was interesting to watch the translation morph as I typed and provided more context. It wasn't perfect translation in the end, but I was still pretty impressed.

tomjen3 · on Sept 23, 2010

I did - I used it to translate from simplified Chinese to English, and the results where extremly impressive - there where very few mistakes and some of the paragraphs read like they had been written by a professional translater.

tingley · on Sept 23, 2010

Like most MT engines, Google Translate quality varies wildly by language pair. When MT is used for pre-translation in professional translation settings, it's pretty common to cherry-pick particular engines for particular source and target locales.

xtacy · on Sept 23, 2010

Google Translate sort of works on the premise that at least the input text isn't confusing.

othello · on Sept 23, 2010

If by not confusing you mean not ambiguous, then that's rarely the case in Japanese, which is a very context-sensitive language.

The original Japanese text is probably not at fault here.

rubashov · on Sept 23, 2010

There's almost no such thing as Japanese text that isn't confusing. It is a rather imprecise language. I'm not sure machine translation will ever work well.

techiferous · on Sept 24, 2010

When I worked in Germany some coworkers were trying to figure out the English word for betriebshof (bus depot) by using Altavista's Babelfish, which claimed it was "yardyard yard".

bherms · on Sept 23, 2010

We did an experiment in my societies and culture class where we tested translations between several different languages using several different services. What's really interesting to see is when you translate between a language and back again, or go through multiple channels to origin--eg, English->French->German->English. We still have a long way to go before machines can provide perfect translations, but that's part of the fun right? Pushing the boundaries and finding new inventive ways to solve the problem of allowing anyone in the world to communicate with anyone else by bringing down the language barrier. We're closer today than we've ever been, and we'll be closer tomorrow, and so on.

eru · on Sept 23, 2010

You should try the same games with human translators for comparison. Perhaps using mturk, to make it cheap (although non-professional, but at least comparable in price to Google translate).

VBprogrammer · on Sept 24, 2010

Exactly what I was going to point out. There really is no such thing as a perfect translation given that human translators will use there own interpretation and knowledge of both languages to try and approximate the same meaning.

jamesbkel · on Sept 23, 2010

You may enjoy this

http://translationparty.com/

scotty79 · on Sept 24, 2010

This is amusing: http://translationparty.com/#7966687

slowpoison · on Sept 24, 2010

How about this one?

http://translationparty.com/#7967523

jodrellblank · on Sept 23, 2010

Also http://tashian.com/multibabel/

robk · on Sept 24, 2010

Many SMT systems are based on European Court, but Google's had people working on acquiring parallel texts for years now and has one of the largest corpora of parallel texts in existence digitally, as far as I know.

Quality is logarithmically proportional to the volume of unique text available. Thus, there's a rough formula for every doubling of the corpus for any language, quality increases by a few points (on a well known scale of translation quality).

The general assumption is that over time this statistical technique, along with the growing data acquisition of Google, will approach human quality.

But you can assume Google's tried to acquire tons of available parallel texts. Book translations, government (any multilingual gov't is great, Canada for ex), religion, etc. Sky's the limit.

mech4bg · on Sept 24, 2010

I know the initial post was tongue in cheek, but Google Translate seems to be worse than 1999-2000 era Babelfish at times. I often (try to) use it to double check my German before sending off an email and it inevitably fails dramatically and I have to instead check individual words on a decent site like dict.leo.org. It seems weird that a small site like that obliterates Google for word accuracy.

torial · on Sept 25, 2010

For a great discussion / informal research on comparing online translation tools (primarily Google, Bing, Yahoo) on quality, see: http://www.tcworld.info/index.php?id=175

The interesting factor was when they took brand identity away as a criteria given to the graders of the translation Bing and Yahoo's quality scores rose.

rimantas · on Sept 23, 2010

I was having fun with it when translating from one non English to another non English language. The fun part it that it translates Lang1->English->Lang2 and when it does not know how to translate word X from Lang1 to English it just sometimes chooses similar English word and then translates it to Lang2. Often result is hilarious.

superk · on Sept 26, 2010

I'm just surprised that any engineer can be so lacking in foundational english. Are there any programming languages that have a vocabulary that is not english?

phreeza · on Sept 23, 2010

Does anyone with knowledge of japanese have a clew what was actually being asked, and what went wrong?

MikeMacMan · on Sept 23, 2010

I have no idea. I don't know where 'goat time' could come from or why there would be a dragon reference. I've seen a lot of bad translations, but this one is so bizarre that I kind of doubt its authenticity.

ephesus · on Sept 23, 2010

The funniest part is that Nate's response is almost equally bad Japanese.

darren_ · on Sept 24, 2010

Oh, nowhere near as bad, it's readily comprehensible. I think it's machine translated itself though, I was able to get the same phrases out by typing in what I thought the English originals were likely to be (e.g. "私はあなたを助けるつもりです" comes from "I am going to help you").

While doing this I was disappointed to see that Google Translate makes one of the biggest English->Japanese beginner mistakes - overuse of anata ('you'), which you hardly ever use in Japanese.

rflrob · on Sept 24, 2010

It might be bad Japanese (my own Japanese wasn't ever good, and it's been years since I've studied it at all, so I can't evaluate), but it is at least comprehensible.

tjarratt · on Sept 23, 2010

Could you offer a translation? I'm curious how it relates, going the other direction.

auxbuss · on Sept 23, 2010

According to Google Translate <drum roll>:

Mr. Matsumoto, Hi. This is Nate.

Google translator, the incompetence which is vulgar. (Laughs) Make sure to email me directly. You can write in Japanese. I will help you.

Sadly, no goats.

delackner · on Sept 24, 2010

Far from "nearly equally" as bad, at least Nate's text is easily interpreted in the intended way. His grammer may be very stiff and incorrect, but for a non-native speaker who probably doesn't live in Japan, give him some slack.

That said, I did enjoy his construction 「あなたは日本語で書くことができます。私はあなたを助けるつもりです。」Trying to preserve the jerky tone: "You are able to write japanese! It is my intention to rescue YOU."

harisenbon · on Sept 24, 2010

Wow. Google actually did a pretty good job with that one. Although I have to admit, the sentences are short and unnaturally terse, so google would have an easy time with it.

It actually sounds better in the Google-translated english than his Japanese. :/

I do like that he wrote his name two different ways in the same message. (Ne-to vs Neito)

abdelazer · on Sept 24, 2010

I look forward to someone re-captioning All Your Base with this text.

9ec4c12949a4f3 · on Sept 23, 2010

Maybe trolls are over-riding the suggest improved translation feature...

labboy · on Sept 23, 2010

Not perfect, plenty of delayed reaction, but at least gives a flavor and some easier than previous access to good stats on what gets blocked and why

js2 · on Sept 24, 2010

You know what would have been neat? The readers of HN working together to try to reverse engineer the original meaning. There's an attempt at that by a couple readers. But is that what get's voted to the top of this thread? No, an inane comment sits at the top with 58 points. Disappointing.

devmonk · on Sept 23, 2010

松本武 - At often, the goat-time install a error is vomit.

Agreed.

松本武 - To how many times like the wind, a pole, and the dragon?

I don't know... 5, maybe 7?

松本武 - Install 2,3 repeat, spank, vomit blows

Was that a Windows install?

松本武 - goat-time see like the wind, pole, and dragon?

Goat-time... is that like happy hour?

松本武 - This insult to father's stones?

Ow... don't bring my father's stones into this.

松本武 - JSP error handler with wind, pole, dragon with intercourse to goat-time?

I knew Oracle was taking Java downhill, but whoa... a dragon? No way.

松本武 - Or chance lack of skill with a goat-time?

Would you care to participate in a game of skill?

松本武 - Please apologize for your stupidity.

I'm sorry.

松本武 - There are a many thank you

You are welcome!