Data for 2009 + 2010 March Madness. Can your algorithm predict the tourney?

aneesh · on March 10, 2010

We had a (small) Hacker News fantasy league for March Madness last year -- the only rule was that your picks had to be by some algorithm which you shared after everyone made their picks. I'd be happy to set up one for this year if there's enough interest.

danger · on March 10, 2010

That sound fun. I think the rule should be a bit more hardcore, though: that the predictions have to come from raw data. i.e., no meta-algorithms that use information about seeding or expert predictions, but if somebody wanted to gather, say, play-by-play data and use that, it'd be ok.

aneesh · on March 10, 2010

Ok, I've created a group called "HN" on Yahoo! Sports.

Here's the link: http://y.ahoo.it/mVPMVA8X

CytokineStorm · on March 10, 2010

How is one supposed to run any sort of machine learning algorithm with only two seasons of data? I could understand throwing the stats from the last 15-20 seasons into Weka and seeing what it said about 2010, but seriously how useful is only 2 seasons worth of data going to be?

dwine · on March 10, 2010

The data there has the scores from ~5000 games played over the course of each season, and the model he links to also seems quite reasonable to me: http://blog.smellthedata.com/2009/03/data-driven-march-madne...

Don't think of it as two data points. Think of it as two data sets.

waterlesscloud · on March 10, 2010

And being college teams, their ratings can change drastically over the course of more than a year or two...

rohanseth · on March 10, 2010

Great idea! Looking forward to seeing your predictions.