Hacker News new | past | comments | ask | show | jobs | submit login

I don't know if I missed it, but I would have liked to have seen a performance/accuracy comparison between wide+deep learning and a simple ensemble between wide and deep models. The advantage to having 2 separate models is that you could use just one or the other if something went wrong, or if you needed to make a faster prediction (i.e. when the escalator breaks, you get stairs).



Yeah, this is a pretty good point. IMO it's a major flaw in the paper that they didn't empirically compare their new model to other models (like just an ensemble, as you mention), though they do discuss it.

I wouldn't be surprised if a simple ensemble performs better!


In many cases the simple ensemble is more fragile because you have to keep track of multiple things at once. The winning solution for the Netflix Kaggle competition was an intractable ensemble which never got used in production. Also when you update your models you'll have to tune them individually, then manually tune the ensemble weights.

Another advantage of joint learning (which the authors mentioned) is that the individual models need not be as big when trained independently since they complement each other. Though the joint model will surely be bigger than each of the individual models.


They (reasonably) claim the joint model doesn't have to be as big, but, for example, it would be interesting the see an ensemble of 2 models: a wide model of the same size as the wide half of the joint model, and a hierarchical model of the same size as the hierarchical half of the joint model.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: