Hacker News new | past | comments | ask | show | jobs | submit login

Unexpectedly insightful and answers some of the same questions I had early on: not just “how” questions, but “why” as well. You see the pattern with softmax quite often. I wish it was taught as “differentiable argmax” rather than by giving people a formula straight away. That’s not all it is, but that’s how it’s often used.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: