>my goal was just to make it as easy as possible to learn on arbitrary struc... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

rm999 on April 3, 2013 | parent | context | favorite | on: Abusing hash kernels for wildly unprincipled machi...

>my goal was just to make it as easy as possible to learn on arbitrary structured data

I'd be very careful about throwing arbitrary data at your learner, at least if you don't understand your data well. Oftentimes the predictors and response are not properly separated in the same way they will be during real-world usage (for example, in time); this leads to target leaks, where your model is effectively cheating by using data it won't have in production.

Target leaks are obvious when the classifier performs suspiciously well on in-sample test data, but sometimes the repercussions are more subtle but still very damaging in a production environment.

mhluongo on April 3, 2013 [–]

Hm, couldn't a hybrid approach deal with this? Eg hash all the data except a few dimensions you think are vital, and add those to the resulting hashed array?

mkmkmmmmm on April 4, 2013 | [–]

Or location-aware hashing?

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact