Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This should be trivially solveable with a glossary as context, as you suggest. I bet the above repo would love a PR, too!


But the error happens in 'audio to text' part, so text prompt won't solve it. The way to fix it is probably fine-tuning the underlying audio to text model.


Doing audio-to-text requires having a statistical model for what word or phrase a piece of sound is most likely to be. Without context, you can't do better than ranking the most likely candidates where a common word is more likely than an uncommon one. Having a task-specific dictionary at that point would help.

One could also imagine doing it at the summary step where the AI could simply be asked to do phonetic analysis. "Here is a transcription of a meeting. Here is a list of terms/names/participants etc. Given the transcription, the meeting context/topics and assuming the transcriptor has made errors, replace similarly sounding words and terms with more likely ones from the context"


Whisper accepts a system prompt.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: