Subtitle is now open-source

ipsum2 · on Nov 24, 2023

Whisper already generates subtitles[0], supporting VTT and SRT so this is just a thin wrapper around that.

[0]: https://github.com/openai/whisper/blob/e58f28804528831904c3b...

dicytea · on Nov 24, 2023

Yeah, you can just do whisper --output_format vtt, so I have no idea what this wrapper even adds.

I wonder if the whole thing is just an AI-generated project. The "About Me" section is pretty illuminating (unabridged):

> I'm a Developer i will feel the code then write.

innovatorved · on Nov 24, 2023

While this approach may seem simpler, this project method utilizes a more optimized and faster model, resulting in improved efficiency and performance.

socks · on Nov 24, 2023

I looked through the source and have to agree with ipsum2... where is the part that isn't just wrapper code?

anthk · on Nov 24, 2023

I used whisper-cpp on an atom Netbook to transcript some short and old UK video from the 60's into written English (I am not a native speaker). I think it lasted lasted than an hour.

ranting-moth · on Nov 24, 2023

Do you have any references to back that up? To me this just sounds like a sale pitch.

arshxyz · on Nov 24, 2023

All of OP's three posts point to the same developer. Just saying.

innovatorved · on Nov 24, 2023

Yes, this is a sales pitch

kenss · on Nov 24, 2023

I hate to be blunt but it seems like you don't even know what you're talking about.

codethief · on Nov 24, 2023

I was surprised to see there were no ML-related dependencies (neither models nor libraries), so I had a look at the code: The models are downloaded from Huggingface, and the repo comes with a precompiled whisper.cpp binary to execute them.

innovatorved · on Nov 24, 2023

Yes, for more info check the project references

einpoklum · on Nov 24, 2023

A few things I don't understand...

* What languages are supported? Is there a list?

* What does 'subtitle' do, which 'whisper' doesn't?

* How do I install this system-wide on an apt-based system (in which pip install --system doesn't work)?

socks · on Nov 24, 2023

it's just whisper and some code that downloads the models from huggingface

vjulian · on Nov 24, 2023

I have a question: I have 200-300 hours of audio recordings of interviews. I an using Otter.ai to automate transcription, and for each recording I export a ".vtt" file of the transcript.

What I'd like to do is create a type of ebook of all these transcripts, where if I click on a word, then the corresponding audio will start playing from roughly the same point in time within the interview.

Otter can do this already (if I'm online and logged in to their website), but I don't want to be tied to their website forever. I'd like to have a local copy that can perform similarly. Amazon ebooks can do this as well, I believe, where there is a corresponding verbatim audiobook. However, this project of mine is purely personal. I won't be selling my audio recordings or transcripts.

Any advice? Could software discussed here be helpful in what I'm trying to accomplish?

akx · on Nov 28, 2023

This software won't help you.

If you already have a .vtt, this is not a hard exercise to do e.g. entirely in a browser: parse the .vtt (they're simple text), lay out the text as you like with each segment being a clickable element (e.g. a link), and hook that up to seek an `<audio>` element to where you like.

vjulian · on Nov 29, 2023

Thank you. I am not technical to that extent myself to implement it, but I know enough to understand very well what you mean. I’ll hire a contractor, and this helps me communicate effectively.

akx · on Nov 29, 2023

This would be a less-than-an-hour hack/prototype for a competent frontend engineer, FWIW.

rainburg · on Nov 24, 2023

AFAIK Whisper still can't handle multi-language content. If the audio has two languages (different narrators, for example), Whisper transcribes both of them during the first minute or so, and then either entirely skips one of the languages, or translates the foreign language to English, for the rest of the audio.

So, the value proposition of a subtitle-generating wrapper for Whisper would be to have an option to split audio into ~1 minute segments, transcribe them separately, and to somehow accurately join them. And I don’t think this one does such a thing.

nottorp · on Nov 24, 2023

I don't know what you're thinking about but when I watch a movie I'm happy if all subtitles are in the same language :) One that I know ideally.

extua · on Nov 24, 2023

I could see myself using this, subtitling things is extremely time consuming and there aren't that many tools which will automate it for you. It looks pretty straightforward to use - just two steps to install (if you already have FFMpeg and Python), and then one command to run the script. Well done!

innovatorved · on Nov 24, 2023

If you find this project helpful, please consider starring the repo

callalex · on Nov 24, 2023

@dang this person is clearly using sock puppet accounts on HN.

Vaslo · on Nov 24, 2023

Which person??

hr2016 · on Nov 24, 2023

Very interesting, would it be (is it?) possible to output the subtitle in a different language? For example English to Icelandic?

btdmaster · on Nov 24, 2023

I wonder how much more a model would learn about subtitles from including audio AND video in training. Sure, the costs would be way bigger (parsing video even deterministically is 1.5 orders of magnitude worse than audio) but it might help with the edge cases where the speech is so unclear even the subtitle scene can't agree.

innovatorved · on Nov 24, 2023

[flagged]

socks · on Nov 24, 2023

this sounds like chatgpt drivel

innovatorved · on Nov 24, 2023

nhh, Google Bard

benob · on Nov 24, 2023

There is also Whisperx, a modification of whisper with accurate word timing and confidence scores.

It gives pretty good subtitles.

whywhywhywhy · on Nov 24, 2023

Could really benefit from an example of what comes out the other end of it, in this article and in the repo.

elkos · on Nov 24, 2023

Maybe an off-topic comment.

I'm not a native English speaker and I tend to use the LiveCaption application in Linux when I attend English speaking online meetings. Would love to have the opportunity to have subtitles in my native language (Greek) too while doing so.

anthk · on Nov 24, 2023

I do the same with tech oriented podcasts. They have a clear speech, so transcribing them right it's very easy to do. Non-native English speaker here, too.

einpoklum · on Nov 24, 2023

Seems not to work - it fails to generate a VTT file:

https://github.com/innovatorved/subtitle/issues/6

epups · on Nov 24, 2023

Thank you, I was looking for a similar tool and was surprised by how difficult it was to find something with no bloat. Will give it a shot

lern_too_spel · on Nov 25, 2023

I've gotten good results with whisperx when I needed to generate captions. https://github.com/m-bain/whisperX

There is currently a problem with diarization, but otherwise, it is SOTA.

innovatorved · on Nov 24, 2023

Thank you for checking out the project! If you find it useful, please consider starring the repository.

butz · on Nov 24, 2023

What hardware do I need to run this locally? What languages are supported?

alberth · on Nov 24, 2023

Siri

I hope Siri does something to improve. It’s voice-to-text for me is still horrible.