Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

^ A thought that everyone has had at one point when processing human text before learning the hard way (like end of sentence detection). :P

The difference is that even weak LLMs are good at magically doing this, so I wonder what the problem is for the TTS mentioned above.



Kokoro is small and fast because all the text -> phoneme conversion is done by “dumb code” and only the phoneme -> sound part is done using a neural net.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: