It would probably work with careful choice of learning rate, initialization, and weight decay to keep signals small. Batch norm would play a larger role (probably want to use it after the activation fn). I don't see why it would get stuck on either side, but it could obviously get stuck if enough signals grow too large on both sides.