Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Hugging Face tackles speech-to-speech (github.com/huggingface)
27 points by andito on Sept 3, 2024 | hide | past | favorite | 5 comments


This is just STT+LLM+TTS, GPT-4o voice mode that is being released uses a single model to listen and generate audio tokens, this allows a much better understanding of the environment (like understanding two people talking at the same time) and a much more powerful speech generation (like singing).


Got it working on my mac, a little delayed but usable. I started with a python problem and then started lying to it about how their suggestion opened a new dimension to hell and demons were now in my house. Then there was the weirdest sounding audio glitch (like demonic sounding) and then it said "just make sure to follow the right command and sequence of actions". Soo 10/10?


Hugging Face released Speech To Speech, an effort for an open-sourced and modular GPT4-o. With this, you can create a voice assistant that replies in under 500 ms. Its modularity and Apache 2.0 license make it perfect for integrating it into any project requiring a powerful voice assistant. It can run locally on a MacBook or be set up on a server. It supports multiple languages, and it can even change languages in under 100ms after detecting that the user is speaking a different language.


God help people building in this space the rate of catchup from competitors is just so quick. How can you get traction when your breakthrough is available to everyone to build within months.


thank god, now we just need a decent gpt desktop app instrumentation proxy to expose the api and we can stop using the "screw u scarlet" voices they released for launch




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: