Hacker News new | past | comments | ask | show | jobs | submit login

Just wrapped up a live translation feature that watches an HLS live stream, live transcribes with whisper and then translates into 18 languages. Heavy use of OTP: supervision trees for each stream, ports for managing ffmpeg and receiving audio, async tasks for concurrently submitting chunks to translation API, etc. Transcription and translations provided in real-time via LiveView to about 4,000 viewers.

Little over 3 weeks from `mix new` to the live event. Not sure how I could have done it so easily without OTP and LiveView.




That's really cool. Managing ports is something I've done in elixir yet, but managing ffmpeg through elixir could be very useful for a few things in my current area of focus.

Can you share any details of how that works? E.g. do you operate on a single HLS chunk at a time, or can you get ffmpeg to separate a continuous audio stream, etc?


davidw is right that ports are somewhat limited, but I haven't had much trouble doing what I need with FFmpeg in particular. I used the bash wrapper from the docs for Port [^1] and it has worked well.

When a stream starts I start a supervisor that then starts a GenServer to manage the port. On init a port is started for FFmpeg (using the above bash wrapper) with args that sends 16-bit PCM audio back to the port through the `handle_info/2` callback.

When a new live HLS segment is downloaded by FFmpeg the entire segment's audio is sent to the GenServer all at once (could be a few handle_info/2 calls, but it happens quickly). Since I want to work in small fixed chunks, I send the segment's audio to an AudioBuffer GenServer (started as a sibling under the same supervisor). This buffer uses binary pattern matching to segment the audio in chunks exactly 2 seconds long while keeping any remainder in the GenServer's state for the next buffer event. I then send the chunks to another ChunkBuffer GenServer that pops chunks at 2-second intervals for processing.

Since everything is supervised, if (when...) FFmpeg crashes the supervisor just restarts it. Meanwhile, the audio in the buffer is still processing and nothing goes down. There might be a duplicate word or two in the transcription if the restarted port processes a segment again, but everything keeps running smoothly.

For even more reliability, I have the application running clustered across four locations in the US, EMEA, and APAC using libcluster[^2]. The stream supervisor is started under a Horde.DynamicSupervisor[^3] with a custom distribution strategy. The strategy prefers the region closest to the company HQ, but if it goes down, the processes will be restarted in another region.

[^1]: https://hexdocs.pm/elixir/1.13.4/Port.html#module-zombie-ope...

[^2}: https://github.com/bitwalker/libcluster

[^3]: https://github.com/derekkraan/horde


Absolutely fantastic write up - thank you so much. I will go away and do some further reading!


https://github.com/saleyn/erlexec is pretty good for handling external processes. The builtins aren't quite there if you have more complex use cases.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: