Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Note that these theorems show that there exists a transformer that can solve these problems, they tell you nothing about whether there is any way to train that transformer using gradient descent from some data, and even if you could, they don't tell you how much data and of what kind you would need to train them on.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: