Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I recorded myself saying a few sentences from that transcript, then fed it through different Whisper models. "small.en" and "large-v1" both generated "chat GPT", "large-v2" generated "chat-gpt", but somehow "medium.en" correctly generated "ChatGPT".

This was the same audio sample fed through each of those four models, with no "prompting" as you're discussing.

If I add "--initial_prompt ChatGPT", then all four models are able to get the spelling correct.

Regardless, I don't think "chat GPT" versus "ChatGPT" is a huge deal. There will always be some level of uncertainty and ambiguity in the transcript, and even books written by humans always have a few typos get past multiple stages of copy editing. Perfection is virtually unachievable, but you can always scroll through the transcript and make some edits after the fact, if desired. Maybe some future model will magically eliminate all typos.



Yeah it wasn't a big problem for me - I had to do a bunch of other tidy-ups on the transcript anyway to add things like the name of the person who was speaking.

I cleaned that bit up with a bulk replace of "chat GPT" with "ChatGPT" in VS Code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: