> The comment: I noticed that your demo video also had "emotional" video layered on top of the dialogue. This could be considered manipulative; perhaps consider sharing a naked version so we could attempt to interpret the emotion based solely on the text to speech engine.
The music is still there and has an obvious effect.
I thought the demo was impressive, but these things do seem like an effort to distract from (or more accurately bolster the effect of) the core technology.
Though maybe the right call since this is less a strict technical demo and more a way to drive interest/marketing.
The 'high levels of expressivity' comment was more of a flag to me, it's a meaningless phrase alone but it's suggested as an obvious answer. It feels like a mysterious answer [0].
I recognize though this is a marketing video, the core tech demo is cool, and I'm probably being unfairly critical. Flags like that make me more skeptical than I would otherwise be by default.
Close your eyes