Hacker News new | past | comments | ask | show | jobs | submit login

> seems to be mostly irrelevant

From a perspective that could be too local in time. But:

> ReLU activation functions

Why did you pick ReLU, of all? The sigmoid makes sense because of the aesthetic (with reference to the derivative), but ReLU in that perspective is an information cutoff. And in the perspective of the goal, I am not aware of a theory that defends it as "the activation function that makes sense" (beyond effectiveness). Are you saying that working applications overwhelmingly use ReLU? If so, which ones?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: