Hacker News new | past | comments | ask | show | jobs | submit login

It would probably work with careful choice of learning rate, initialization, and weight decay to keep signals small. Batch norm would play a larger role (probably want to use it after the activation fn). I don't see why it would get stuck on either side, but it could obviously get stuck if enough signals grow too large on both sides.



Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: