Hacker News new | past | comments | ask | show | jobs | submit login

I found a big improvement to my model just after the contest was over, so I made a proposal to develop the improved model and do some more in-depth analysis. I'm currently waiting for their reply!

If they accept I'll have an opportunity to look deeper -- it's one thing to develop an efficient model, but fully exploiting it in order to gain a better understanding of the data takes some work. A limitation of these contests is that you're rewarded for producing a very efficient model, but there is very little emphasis on analysis of your model once you built it. I think it's a shame, because the person who built the model is often in the best position to have a good intuition of both the dataset and why the model had to be built that way.

I've been considering opening a blog, but I haven't found time to do so yet.

Briefly, the purpose of the contest wasn't to understand the effect of breastfeeding, but to understand how important normal child growth is to mental development. They included several scenarios: with all data available, with demographic data removed, and with demographic data and growth curves removed. Unfortunately, IQ is so overwhemingly affected by demographic that the scenarios without demographic data devolved into a game of extracting all the demographic data that was leaked by non-demographic variables. And when demographic data is available, more than 90% of the variance extracted by the model comes from demographic data rather than biological measurements!

It's really disheartening to think that depending on the social setting you come from, you start with an IQ of 85 or 115 -- at age 7...




Pardon a possibly dumb question, but what are kinds of data are considered "demographic" in your specific case? Can you give some examples?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: