This should be trivially solveable with a glossary as context, as you suggest. I...

sesm · 2025-02-12T13:54:36 1739368476

But the error happens in 'audio to text' part, so text prompt won't solve it. The way to fix it is probably fine-tuning the underlying audio to text model.

alkonaut · 2025-02-12T15:11:40 1739373100

Doing audio-to-text requires having a statistical model for what word or phrase a piece of sound is most likely to be. Without context, you can't do better than ranking the most likely candidates where a common word is more likely than an uncommon one. Having a task-specific dictionary at that point would help.

One could also imagine doing it at the summary step where the AI could simply be asked to do phonetic analysis. "Here is a transcription of a meeting. Here is a list of terms/names/participants etc. Given the transcription, the meeting context/topics and assuming the transcriptor has made errors, replace similarly sounding words and terms with more likely ones from the context"

ukuina · 2025-02-13T04:18:29 1739420309

Whisper accepts a system prompt.