The thing with humans is they will say “I don’t remember how many syllables a ha...

pixl97 · on Sept 9, 2023

Eh, it's not D&K gone berserk, it's what happens when you attempt to compress reality down to a single dimension (text). If you're doing a haiku, you will likely subvocalize it to ensure you're saying it correctly. It will be interesting when we get multimodal AI that can speak and listen to itself to detect things like this.

earthboundkid · on Sept 10, 2023

The problem isn’t just that everything is text. It’s that everything is a Fourier transform of text in such a way that it’s not actually possible for an LLM to learn to count syllables.

pixl97 · on Sept 10, 2023

Again, that is just using text only.

Imagine you have a lot more computing resources in a multimodal LLM. It sees your request of count the syllables and realizes it can't do them from text alone (hell I can't and have to vocalize it). It then sends your request to a audio module and 'says' the sentence, then another listening module that understand syllables 'hears' the sentence.

This is how it works in most humans, now if you do this every day you'll likely make some kind of mental shortcut to reduce the effort needed, but at the end of the day there is no unsolvable problem on the AI side.