Hacker News new | past | comments | ask | show | jobs | submit login
Google Translate might not be as good as people thought (groups.google.com)
109 points by robin_reala on Sept 23, 2010 | hide | past | favorite | 55 comments



Most SMT systems are trained using the procedings of the European Court - as it's a huge corpus of multilingual documents which all have the same meanings.

This is a large factor as to why non-european languages typically don't fare that well in Statistical Machine Translation systems. The corpuses (corpii?) aren't as large for other languages.

Subject-broadness is another problem. Early information retrieval systems were trained (and many still are) using the Wall Street Journal as the corpus. Which means they work great for searching on the topic of big business, but not so great on getting apple pie recipes, as WSJ doesn't talk much about that ;)


There are some research projects (in rather early stages) that develop SMT techniques for translation to languages without big parallel corpora (essentially by bootstrapping such corpora, assisted by active learning). This could be of particular importance to keep smaller languages from disappearing, otherwise less and less works in that language will be available (yes, I'm aware that there are many people who consider language death a good thing).


Sounds fascinating; links? research? Applicability to lost languages like Linear A?


'(corpii?)'

corpores, if you must.


The plural of corpus is 'corpora': http://en.wikipedia.org/wiki/Text_corpus


slaps forehead of course it is! Thanks :)


    Please apologize for your stupidity.
I wish I had the gall to say this to people



There needs to be a place to actually apologize, toss a textarea in there and show previous apologies below :)


What an awesome idea. I'm adding that to my to-do list.


So far, I'm guessing :

  lumber = log
  vomit = thow
  insult to father's stones = thrown in superclass
  wind, pole, and dragon = maybe "try,catch,finaly" or "if,then,else"
  goat-time = WTF ?


Wind in Japanese is 'kaze' (風), and the kanji character (according to rikaikun) has a second meaning: method.

http://translate.google.com/#en|ja|wind

Edit:

The kanji for dragon seems to have a reading: "ryuu" and that "sound" (if not written in kanji) has several meanings, one of which is also "method"

http://jisho.org/words?jap=ryuu&dict=edict

"goat" seems to be "yagi", and putting that in jisho.org, one of the results is related to "night shift", so maybe a nightly process? nightly build?

http://jisho.org/words?jap=yagi&dict=edict


I can't imagine anybody using '風' for 'method' for anything relating to software. It's usually written in either katakana (メソッド), or the mathematical term for function is used instead (関数).

Really wish I could see the original message...


I wish the author of that post could see a translation back into japanese of what he posted. I'd love to laugh together with him :-)


could it then be just a general "method", as in "the way to do a certain thing"?


Kind of, although I think that 'in the style of' is a more accurate translation. For example, you might see '日本風のステーキ' for 'Steak, in the Japanese style.'


runtime maybe.


I have a Russian friend who used my computer to check her vkontakte (Russian facebook), she's not used to Chrome and it auto-translated the page. At first she didn't notice because she's equally capable in English, then she started to giggle at the word pancake (which is a poor translation), but she said that on the whole it did a really good job. I guess some languages are easier to do. Anecdotal I know, but its good enough for me to use Russian websites to read photo comments.


Another interesting thing about Google Translate is that people sometimes successfully troll it by amusing the "submit a better translation" feature, which Google ostensibly uses without much checking.

I'd point out an example, but Google's engineers probably read this site, and the examples I have are too valuable for my personal amusement to give up :)


I love amusing the "submit a better translation" feature.


Thanks :) That was a fun grammar error.


Did people think it was that good? IME, it translates normal text to and from many languages into something that provides the gist if lacking in the nuances. Which, IMHO, is flippin fantastically great even in the face of the odd horrible translation.


I was recently typing in some phrases from a 19th century book in French (which I'd found on Google Books), and it was interesting to watch the translation morph as I typed and provided more context. It wasn't perfect translation in the end, but I was still pretty impressed.


I did - I used it to translate from simplified Chinese to English, and the results where extremly impressive - there where very few mistakes and some of the paragraphs read like they had been written by a professional translater.


Like most MT engines, Google Translate quality varies wildly by language pair. When MT is used for pre-translation in professional translation settings, it's pretty common to cherry-pick particular engines for particular source and target locales.


Google Translate sort of works on the premise that at least the input text isn't confusing.


If by not confusing you mean not ambiguous, then that's rarely the case in Japanese, which is a very context-sensitive language.

The original Japanese text is probably not at fault here.


There's almost no such thing as Japanese text that isn't confusing. It is a rather imprecise language. I'm not sure machine translation will ever work well.


When I worked in Germany some coworkers were trying to figure out the English word for betriebshof (bus depot) by using Altavista's Babelfish, which claimed it was "yardyard yard".


We did an experiment in my societies and culture class where we tested translations between several different languages using several different services. What's really interesting to see is when you translate between a language and back again, or go through multiple channels to origin--eg, English->French->German->English. We still have a long way to go before machines can provide perfect translations, but that's part of the fun right? Pushing the boundaries and finding new inventive ways to solve the problem of allowing anyone in the world to communicate with anyone else by bringing down the language barrier. We're closer today than we've ever been, and we'll be closer tomorrow, and so on.


You should try the same games with human translators for comparison. Perhaps using mturk, to make it cheap (although non-professional, but at least comparable in price to Google translate).


Exactly what I was going to point out. There really is no such thing as a perfect translation given that human translators will use there own interpretation and knowledge of both languages to try and approximate the same meaning.






Many SMT systems are based on European Court, but Google's had people working on acquiring parallel texts for years now and has one of the largest corpora of parallel texts in existence digitally, as far as I know.

Quality is logarithmically proportional to the volume of unique text available. Thus, there's a rough formula for every doubling of the corpus for any language, quality increases by a few points (on a well known scale of translation quality).

The general assumption is that over time this statistical technique, along with the growing data acquisition of Google, will approach human quality.

But you can assume Google's tried to acquire tons of available parallel texts. Book translations, government (any multilingual gov't is great, Canada for ex), religion, etc. Sky's the limit.


I know the initial post was tongue in cheek, but Google Translate seems to be worse than 1999-2000 era Babelfish at times. I often (try to) use it to double check my German before sending off an email and it inevitably fails dramatically and I have to instead check individual words on a decent site like dict.leo.org. It seems weird that a small site like that obliterates Google for word accuracy.


For a great discussion / informal research on comparing online translation tools (primarily Google, Bing, Yahoo) on quality, see: http://www.tcworld.info/index.php?id=175

The interesting factor was when they took brand identity away as a criteria given to the graders of the translation Bing and Yahoo's quality scores rose.


I was having fun with it when translating from one non English to another non English language. The fun part it that it translates Lang1->English->Lang2 and when it does not know how to translate word X from Lang1 to English it just sometimes chooses similar English word and then translates it to Lang2. Often result is hilarious.


I'm just surprised that any engineer can be so lacking in foundational english. Are there any programming languages that have a vocabulary that is not english?


Does anyone with knowledge of japanese have a clew what was actually being asked, and what went wrong?


I have no idea. I don't know where 'goat time' could come from or why there would be a dragon reference. I've seen a lot of bad translations, but this one is so bizarre that I kind of doubt its authenticity.


The funniest part is that Nate's response is almost equally bad Japanese.


Oh, nowhere near as bad, it's readily comprehensible. I think it's machine translated itself though, I was able to get the same phrases out by typing in what I thought the English originals were likely to be (e.g. "私はあなたを助けるつもりです" comes from "I am going to help you").

While doing this I was disappointed to see that Google Translate makes one of the biggest English->Japanese beginner mistakes - overuse of anata ('you'), which you hardly ever use in Japanese.


It might be bad Japanese (my own Japanese wasn't ever good, and it's been years since I've studied it at all, so I can't evaluate), but it is at least comprehensible.


Could you offer a translation? I'm curious how it relates, going the other direction.


According to Google Translate <drum roll>:

Mr. Matsumoto, Hi. This is Nate.

Google translator, the incompetence which is vulgar. (Laughs) Make sure to email me directly. You can write in Japanese. I will help you.

Sadly, no goats.


Far from "nearly equally" as bad, at least Nate's text is easily interpreted in the intended way. His grammer may be very stiff and incorrect, but for a non-native speaker who probably doesn't live in Japan, give him some slack.

That said, I did enjoy his construction 「あなたは日本語で書くことができます。私はあなたを助けるつもりです。」Trying to preserve the jerky tone: "You are able to write japanese! It is my intention to rescue YOU."


Wow. Google actually did a pretty good job with that one. Although I have to admit, the sentences are short and unnaturally terse, so google would have an easy time with it.

It actually sounds better in the Google-translated english than his Japanese. :/

I do like that he wrote his name two different ways in the same message. (Ne-to vs Neito)


I look forward to someone re-captioning All Your Base with this text.


Maybe trolls are over-riding the suggest improved translation feature...


Not perfect, plenty of delayed reaction, but at least gives a flavor and some easier than previous access to good stats on what gets blocked and why


You know what would have been neat? The readers of HN working together to try to reverse engineer the original meaning. There's an attempt at that by a couple readers. But is that what get's voted to the top of this thread? No, an inane comment sits at the top with 58 points. Disappointing.


松本武 - At often, the goat-time install a error is vomit.

Agreed.

松本武 - To how many times like the wind, a pole, and the dragon?

I don't know... 5, maybe 7?

松本武 - Install 2,3 repeat, spank, vomit blows

Was that a Windows install?

松本武 - goat-time see like the wind, pole, and dragon?

Goat-time... is that like happy hour?

松本武 - This insult to father's stones?

Ow... don't bring my father's stones into this.

松本武 - JSP error handler with wind, pole, dragon with intercourse to goat-time?

I knew Oracle was taking Java downhill, but whoa... a dragon? No way.

松本武 - Or chance lack of skill with a goat-time?

Would you care to participate in a game of skill?

松本武 - Please apologize for your stupidity.

I'm sorry.

松本武 - There are a many thank you

You are welcome!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: