Another recent example: https://github.com/supertone-inc/supertonic
https://huggingface.co/spaces/Supertone/supertonic-2
It seems like it is being trained by one person, and it is surprisingly natural for such a small model.
I remember when TTS always meant the most robotic, barely comprehensible voices.
https://www.reddit.com/r/LocalLLaMA/comments/1qcusnt/soprano...
https://huggingface.co/ekwek/Soprano-1.1-80M
Another recent example: https://github.com/supertone-inc/supertonic