Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

thank you! We have an architecture diagram and some more details in the tech report here: https://lemonslice.com/live/technical-report

And yes, exactly. In between each character interaction we need to do speech-to-text, LLM, text-to-speech, and then our video model. All of it happens in a continuously streaming pipeline.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: