Most SMT systems are trained using the procedings of the European Court - as it's a huge corpus of multilingual documents which all have the same meanings.
This is a large factor as to why non-european languages typically don't fare that well in Statistical Machine Translation systems. The corpuses (corpii?) aren't as large for other languages.
Subject-broadness is another problem. Early information retrieval systems were trained (and many still are) using the Wall Street Journal as the corpus. Which means they work great for searching on the topic of big business, but not so great on getting apple pie recipes, as WSJ doesn't talk much about that ;)
There are some research projects (in rather early stages) that develop SMT techniques for translation to languages without big parallel corpora (essentially by bootstrapping such corpora, assisted by active learning). This could be of particular importance to keep smaller languages from disappearing, otherwise less and less works in that language will be available (yes, I'm aware that there are many people who consider language death a good thing).
I can't imagine anybody using '風' for 'method' for anything relating to software. It's usually written in either katakana (メソッド), or the mathematical term for function is used instead (関数).
Kind of, although I think that 'in the style of' is a more accurate translation. For example, you might see '日本風のステーキ' for 'Steak, in the Japanese style.'
I have a Russian friend who used my computer to check her vkontakte (Russian facebook), she's not used to Chrome and it auto-translated the page. At first she didn't notice because she's equally capable in English, then she started to giggle at the word pancake (which is a poor translation), but she said that on the whole it did a really good job. I guess some languages are easier to do. Anecdotal I know, but its good enough for me to use Russian websites to read photo comments.
Another interesting thing about Google Translate is that people sometimes successfully troll it by amusing the "submit a better translation" feature, which Google ostensibly uses without much checking.
I'd point out an example, but Google's engineers probably read this site, and the examples I have are too valuable for my personal amusement to give up :)
Did people think it was that good? IME, it translates normal text to and from many languages into something that provides the gist if lacking in the nuances. Which, IMHO, is flippin fantastically great even in the face of the odd horrible translation.
I was recently typing in some phrases from a 19th century book in French (which I'd found on Google Books), and it was interesting to watch the translation morph as I typed and provided more context. It wasn't perfect translation in the end, but I was still pretty impressed.
I did - I used it to translate from simplified Chinese to English, and the results where extremly impressive - there where very few mistakes and some of the paragraphs read like they had been written by a professional translater.
Like most MT engines, Google Translate quality varies wildly by language pair. When MT is used for pre-translation in professional translation settings, it's pretty common to cherry-pick particular engines for particular source and target locales.
There's almost no such thing as Japanese text that isn't confusing. It is a rather imprecise language. I'm not sure machine translation will ever work well.
When I worked in Germany some coworkers were trying to figure out the English word for betriebshof (bus depot) by using Altavista's Babelfish, which claimed it was "yardyard yard".
We did an experiment in my societies and culture class where we tested translations between several different languages using several different services. What's really interesting to see is when you translate between a language and back again, or go through multiple channels to origin--eg, English->French->German->English. We still have a long way to go before machines can provide perfect translations, but that's part of the fun right? Pushing the boundaries and finding new inventive ways to solve the problem of allowing anyone in the world to communicate with anyone else by bringing down the language barrier. We're closer today than we've ever been, and we'll be closer tomorrow, and so on.
You should try the same games with human translators for comparison. Perhaps using mturk, to make it cheap (although non-professional, but at least comparable in price to Google translate).
Exactly what I was going to point out. There really is no such thing as a perfect translation given that human translators will use there own interpretation and knowledge of both languages to try and approximate the same meaning.
Many SMT systems are based on European Court, but Google's had people working on acquiring parallel texts for years now and has one of the largest corpora of parallel texts in existence digitally, as far as I know.
Quality is logarithmically proportional to the volume of unique text available. Thus, there's a rough formula for every doubling of the corpus for any language, quality increases by a few points (on a well known scale of translation quality).
The general assumption is that over time this statistical technique, along with the growing data acquisition of Google, will approach human quality.
But you can assume Google's tried to acquire tons of available parallel texts. Book translations, government (any multilingual gov't is great, Canada for ex), religion, etc. Sky's the limit.
I know the initial post was tongue in cheek, but Google Translate seems to be worse than 1999-2000 era Babelfish at times. I often (try to) use it to double check my German before sending off an email and it inevitably fails dramatically and I have to instead check individual words on a decent site like dict.leo.org. It seems weird that a small site like that obliterates Google for word accuracy.
For a great discussion / informal research on comparing online translation tools (primarily Google, Bing, Yahoo) on quality, see: http://www.tcworld.info/index.php?id=175
The interesting factor was when they took brand identity away as a criteria given to the graders of the translation Bing and Yahoo's quality scores rose.
I was having fun with it when translating from one non English to another non English language. The fun part it that it translates Lang1->English->Lang2 and when it does not know how to translate word X from Lang1 to English it just sometimes chooses similar English word and then translates it to Lang2. Often result is hilarious.
I'm just surprised that any engineer can be so lacking in foundational english. Are there any programming languages that have a vocabulary that is not english?
I have no idea. I don't know where 'goat time' could come from or why there would be a dragon reference. I've seen a lot of bad translations, but this one is so bizarre that I kind of doubt its authenticity.
Oh, nowhere near as bad, it's readily comprehensible. I think it's machine translated itself though, I was able to get the same phrases out by typing in what I thought the English originals were likely to be (e.g. "私はあなたを助けるつもりです" comes from "I am going to help you").
While doing this I was disappointed to see that Google Translate makes one of the biggest English->Japanese beginner mistakes - overuse of anata ('you'), which you hardly ever use in Japanese.
It might be bad Japanese (my own Japanese wasn't ever good, and it's been years since I've studied it at all, so I can't evaluate), but it is at least comprehensible.
Far from "nearly equally" as bad, at least Nate's text is easily interpreted in the intended way. His grammer may be very stiff and incorrect, but for a non-native speaker who probably doesn't live in Japan, give him some slack.
That said, I did enjoy his construction 「あなたは日本語で書くことができます。私はあなたを助けるつもりです。」Trying to preserve the jerky tone: "You are able to write japanese! It is my intention to rescue YOU."
Wow. Google actually did a pretty good job with that one.
Although I have to admit, the sentences are short and unnaturally terse, so google would have an easy time with it.
It actually sounds better in the Google-translated english than his Japanese. :/
I do like that he wrote his name two different ways in the same message. (Ne-to vs Neito)
You know what would have been neat? The readers of HN working together to try to reverse engineer the original meaning. There's an attempt at that by a couple readers. But is that what get's voted to the top of this thread? No, an inane comment sits at the top with 58 points. Disappointing.
This is a large factor as to why non-european languages typically don't fare that well in Statistical Machine Translation systems. The corpuses (corpii?) aren't as large for other languages.
Subject-broadness is another problem. Early information retrieval systems were trained (and many still are) using the Wall Street Journal as the corpus. Which means they work great for searching on the topic of big business, but not so great on getting apple pie recipes, as WSJ doesn't talk much about that ;)