Hacker News new | past | comments | ask | show | jobs | submit login

Removing outliers is a common practice in data science. A quick search on "why removing outliers" returns a lot of discussion on the topic of when it should be done and when it should not. I encourage you to read some to get familiar with the arguments.

In this specific case, I suppose they could have added some motivation for their decision, but I don't think they did it in bad faith. They only had 36 samples, and a 3sd event occurs once in ~300 cases: this hints to the fact that that data point might have been erroneous, or that there could have been some other factor (e.g. earlier exposure to programming concepts - even disguised as something else).

If you are interested in the study, you can find the data and the script in the linked repository: https://github.com/UWCCDL/ComputerWhisperers The range for the learning rate seems wrong. If you check the data, there is no data point with 2.0: https://github.com/UWCCDL/ComputerWhisperers/blob/master/Com...




> They only had 36 samples, and a 3sd event occurs once in ~300 cases: this hints to the fact that that data point might have been erroneous

Or it hints that the distribution of learning rates is not gaussian. When there's an "n-sigma" event, it's usually much more likely that the model is wrong than that the event is that rare.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: