Tried that one. Quality is great but sometimes generations fail and it's rather ...

kristopolous · 2025-05-05T21:22:49 1746480169

alright, dumb question.

(1) I assume these things can do multiple languages

(2) Given (1), can you strip all the languages you aren't using and speed things up?

koljab · 2025-05-05T21:40:45 1746481245

Actually good question.

I'd say probably not. You can't easily "unlearn" things from the model weights (and even if this alone doesn't help). You could retrain/finetune the model heavily on a single language but again that alone does not speed up inference.

To gain speed you'd have to bring the parameter count down and train the model from scratch with a single language only. That might work but it's also quite probable that it introduces other issues in the synthesis. In a perfect world the model would only use all that "free parameters" not used now for other languages for a better synthesis of that single trained language. Might be true to a certain degree, but it's not exactly how ai parameter scaling works.

nardi · 2025-05-06T04:34:59 1746506099

I don't know what I'm talking about, but could you use distillation techniques?

koljab · 2025-05-07T12:56:32 1746622592

Maybe possible, I did not look into that much for Coqui XTTS. What i know is that the quantized versions for Orpheus sound noticably worse. I feel audio models are quite sensitive to quantization.