I have had very good results using Spectropic [1], a hosted Whisper Diarization API service as a platform. I found it cheap and way easier and faster than setting up and using whisper-diarization on my M1. Audiogest [2] is a web service built upon Spectropic, I have not yet used it.
disclaimer : I am not affiliated in any way, just a happy customer! I had some nice mail exchanges after bug reports with the (I believe solo-)developer behind these tools.
Thomas here, maker of Spectropic and Audiogest. I am indeed focused on building a simple and reliable Whisper + diarization API. Also working on providing fine-tuned versions of Whisper of non-English languages through the API.
Feel free to reach out to me if anyone is interested in this!
Great looking API. Are you able to, or do you have plans, for there to be automatic speaker identification based on labeled samples of their voices? It would be great to basically have a library of known speakers that are auto matched when transcribing
Thanks! That is something I might offer in the future and is definitely possible with a library like pyannote. Would be really cool to add for sure.
I am also experimenting with post-processing transcripts with LLMs to infer speaker names from a transcript. It works pretty decent already but it's still a bit expensive. I have this feature available under the 'enhanced' model if you want to check it out: https://docs.spectropic.ai/models/transcribe/enhanced
very informative, thanks! i coincidentally just posted a blogpost about the other complementary approach using pure HTTP requests without headless browsers: https://news.ycombinator.com/item?id=21770576
Author here, really sorry if it felt like an ad, it is not! I have edited the app to include links to competitors. My point is to encourage people to resist Google's hegemony
Author here, thank you for the alternatives suggestions! I have edited the post and included them. I did not want the article to look like an ad for Fastmail by any means
Author here, I had completely missed that news. It is a bit concerning, you are totally right. I have edited the article to mention it and alternatives, thank you.
If you mean modeling the state graph itself, it's usually not modeled in the db but only in the code. It could indeed be interesting to store the graph itself if it evolves often, or at least a version number.
If you were speaking about storing the instances lifecycles, a simple model using a RDBS is to store one row per transition event in a separate table. This is what the papertrail[1] gem does for example.
"Enterprise" applications such as ERPs or CRMs often have fairly complex workflow features where everything is in the underlying database - both high level workflow/state-machine definitions and compiled code.
disclaimer : I am not affiliated in any way, just a happy customer! I had some nice mail exchanges after bug reports with the (I believe solo-)developer behind these tools.
---
[1] https://spectropic.ai/
[2] https://audiogest.app/