Hacker News new | past | comments | ask | show | jobs | submit login

The problem isn’t just that everything is text. It’s that everything is a Fourier transform of text in such a way that it’s not actually possible for an LLM to learn to count syllables.



Again, that is just using text only.

Imagine you have a lot more computing resources in a multimodal LLM. It sees your request of count the syllables and realizes it can't do them from text alone (hell I can't and have to vocalize it). It then sends your request to a audio module and 'says' the sentence, then another listening module that understand syllables 'hears' the sentence.

This is how it works in most humans, now if you do this every day you'll likely make some kind of mental shortcut to reduce the effort needed, but at the end of the day there is no unsolvable problem on the AI side.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: