Hugging Face tackles speech-to-speech

GaggiX · on Sept 3, 2024

This is just STT+LLM+TTS, GPT-4o voice mode that is being released uses a single model to listen and generate audio tokens, this allows a much better understanding of the environment (like understanding two people talking at the same time) and a much more powerful speech generation (like singing).

magicmicah85 · on Sept 3, 2024

Got it working on my mac, a little delayed but usable. I started with a python problem and then started lying to it about how their suggestion opened a new dimension to hell and demons were now in my house. Then there was the weirdest sounding audio glitch (like demonic sounding) and then it said "just make sure to follow the right command and sequence of actions". Soo 10/10?

andito · on Sept 3, 2024

Hugging Face released Speech To Speech, an effort for an open-sourced and modular GPT4-o. With this, you can create a voice assistant that replies in under 500 ms. Its modularity and Apache 2.0 license make it perfect for integrating it into any project requiring a powerful voice assistant. It can run locally on a MacBook or be set up on a server. It supports multiple languages, and it can even change languages in under 100ms after detecting that the user is speaking a different language.

Urahandystar · on Sept 3, 2024

God help people building in this space the rate of catchup from competitors is just so quick. How can you get traction when your breakthrough is available to everyone to build within months.

kelsey98765431 · on Sept 3, 2024

thank god, now we just need a decent gpt desktop app instrumentation proxy to expose the api and we can stop using the "screw u scarlet" voices they released for launch