Google Prediction API - Google Code

drjoem · on July 18, 2011

The service is pretty cool. The training works by uploading your CSV formatted training data (with labels) to Google Storage. Then you make a call to train the google service. Google has not said much about what kind of algorithms they are using behind the scenes, besides the fact that they are using a combination of a proprietary and open-source ML algorithms. The service trains up a variety of different models and then uses a voting scheme to decide which ones are optimal.

A few problems I see (or saw, I havnt used it in a few months) with the service are the following.

1. currently, there is no way to pick your cross validation folds. this can lead to severe overfitting if your data is not i.i.d

2. they provide a numerical (double) accuracy number which corresponds to the accuracy estimated from training. how is this number calculated (AROCS,etc.). They do not say

3. Security issues - read the fine print of what happens when your data gets uploaded to Google storage. It could be a cause for concern

4. Your are competing for resources. When I was testing the API, I would train two successive models with the same amount of data, and I would notice one call would complete (asynchronously) after 10 seconds, while the next would take 10 minutes. This is because your are competing for resources

5. Currently no way to inject prior knowledge into your models. What if you know your data is Guassian, you could use an RBF kernel, but with this API, you cannot, because it might pick the Naive Bayes Classifier and not the SVM, etc.

In general, this service probably will work for the average SPAM detection problem, but if you really want a great system, you probably need to keep everything in house.

equark · on July 18, 2011

Are there any public benchmarks of the prediction API against standard datasets?

mrspandex · on July 18, 2011

It's not quite clear to me what this actually does. How does training work? Do I give it input and actual decisions so that it can decide based on trends or is it more of a fuzzy match to known data?

jvandenbroeck · on July 18, 2011

It has been a while since I have checked it out, but you give it training data (examples + values) and then it will be able to predict values for unseen examples. Eg. predict if a comment on a blog post is spam.

Actually the google prediction api is already really old; I don't know why it shows up now on HN

zoudini · on July 18, 2011

I think it has something to do with this submission's mentioning it (http://news.ycombinator.com/item?id=2776254)

on July 18, 2011

[deleted]

aristidb · on July 18, 2011

How is this relevant to the prediction API? I must be missing something.

shriphani · on July 18, 2011

Ouch you're right, this is not relevant. I was playing with Language detection and confused language identification in prediction with their language API. Deleting it now.

rorrr · on July 18, 2011

If it actually works, why wouldn't Google just feed the stock data to it, and make billions by predicting future prices?

gjm11 · on July 19, 2011

Because "prediction" here doesn't mean "magical prediction of the future", it means "spotting and extrapolating patterns", and (in so far as the Efficient Market Hypothesis is true) there are no exploitable patterns to spot in stock-market data.

(Presumably the EMH is only approximately true, but it's probably close enough that Google can't make billions that way without a considerable risk of losing billions instead, and without a lot of effort that they might do better to put into making billions by more conventional means.)

rorrr · on July 20, 2011

> there are no exploitable patterns to spot in stock-market data.

[citation needed]

bauchidgw · on July 18, 2011

not yet deprecated?

drjoem · on July 18, 2011

DotNetPete1 · on July 19, 2011

I thought this was an april fools joke, then I looked at the date.