Hacker News new | past | comments | ask | show | jobs | submit login

According to [1], k-NN is pretty far from being the top general classifier. Admittedly, all the data sets are small, with no more than 130,000 instances. When does it start becoming "very hard to beat", and what are you basing that on?

1. http://jmlr.org/papers/volume15/delgado14a/delgado14a.pdf




The vast majority of the data sets are very small (on the order of 100 to 1000 examples!). In fact, in the paper they discarded some of the larger UCI datasets.

It's not surprising that they found random forests perform better, even though the conventional wisdom is boosting outperforms (it's much harder to overfit with forests).




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: