Hacker News new | past | comments | ask | show | jobs | submit login

That's true, but they didn't go into any other applications in this explainer and were presenting it strictly as a next-word-predictor. If they are going to include final softmax, they should explain why it's useful. It would be improved by being simpler (skip softmax) or more comprehensive (present a use case for softmax), but complexity without reason is bad pedagogy.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: