More data beats better algorithm at predicting Google earnings

danohuiginn · on April 18, 2008

Problem is, I doubt he would have blogged this if the better algorithm had beaten the larger dataset.

As it happens, I broadly agree with his conclusion (data trumps algorithms), but cherrypicking data-points doesn't provide any evidence for it.

mlinsey · on April 18, 2008

I'm not familiar with this guy's site so I'm not sure if he means something more nuanced, but "more data beats better algorithms" is too vague a claim to test by picking any number of datapoints. 100 datapoints will of course beat 1,000 datapoints if the former is selected via a random sampling that uses a uniform distribution across the entire population with no response bias and the latter is selected by asking the first 1,000 people you happen to see.

dnaquin · on April 18, 2008

For that conclusion, we need more data.

brent · on April 18, 2008

While I haven't looked in depth at either post I find this to be an awfully bold claim and appears to be based on the results of a small number of example problems. There are probably hundreds of papers available where the opposite is shown to be true.

In fact, many come to the conclusion that more data approaches an asymptote for performance of a given algorithm, where implementation of a new algorithm or using an ensemble classifier may substantially increase performance.

kf · on April 18, 2008

The blog author isn't try to say that more data beats a better algorithm in general! That isn't even really what the headline is saying.

The useful thing is that someone with an enormous amount of data is able to predict Google's earnings very accurately. Bookmark their page and check it before the next Google earnings reports come out, you can make some good money by trading the stock.

danohuiginn · on April 18, 2008

erm...that's exactly what he's saying.

Readers of this blog will be familiar with my belief that more data usually beats better algorithms

He may not mean what he says, but we can't guess nuances that aren't in his post.

kf · on April 19, 2008

Oh. OK. In that case, his broader point is best ignored in favor of the more interesting prediction about Google's earnings.

mattj · on April 18, 2008

umm.. This has nothing to do with more data. This is someone finding 2 random numbers that, when combined, come out to another random number.

Their claim is the same as this: "Temperatures rose 2% last year. I ate 1% more potatoes and 1% fewer salmon. Therefore, since 1%+1%=2%, adding the %more potatoes and % fewer salmon will predict the change in temperature"

The blog author is, in fact, using only 2 data points to predict a 3rd. He's not using more data, he's using basically no data.

neilc · on April 20, 2008

I have no idea how you could reach that conclusion from the article. The data in question is not "two random numbers":

EF reported a 19.2% increase in paid clicks and 11.2% increase in CPCs at Google Y-O-Y. Do the math (1.192*1.112 = 1.325), that's a 32.5% Y-O-Y revenue increase. That's the closest anyone got to the real numbers!

yters · on April 19, 2008

Since there isn't a free lunch, randomness wins in the end.