TexTube: Chat with any YouTube video transcript in ChatGPT fast

eigenvalue · 2024-09-17T23:15:57 1726614957

I actually just finished making a service that does something similar, but it also transforms the transcripts to make them into polished written documents with complete sentences and nice markdown formatting. It also can generate interactive multiple choice quizzes. And it supports editing of the markdown files with revision history and one click hosting.

I'm still doing the last testing of the site, but might as well share it here since it's so relevant:

https://youtubetranscriptoptimizer.com/

There might still be a few rough edges, so keep that in mind!

Terretta · 2024-09-18T13:36:43 1726666603

The pricing is confusingly giving counts of videos of short length, rather than time per price.

The vodcasts that most need transcription are long form. After the "don't make me do math" pricing, you do have a table of minutes, up to 60, so for a typical, say, ContraPoints vodcast episode, you multiply by 3, and find out that could cost $30 to turn into the optimized transcript. (Which the creator might well pay for if they value their time, but viewers might not.)

eigenvalue · 2024-09-18T16:48:37 1726678117

Thanks for the feedback. I'll try to clarify the pricing table a bit better. And yes, this is targeting creators more. If it turns out that viewers are the better target market, I might pivot it a bit. And I'm considering adding a discount for longer videos.

Terretta · 2024-09-24T15:01:47 1727190107

I signed up, and it's a beautiful UI, with impeccable results for the PDF or Markdown flavors in particular. Speed was impressive on a video that had subtitles off. Bundling all formats into a zip is a stroke of genius.

Does your tool work on 3 hour vodcasts? There are quite a few long series I would far prefer to read than listen.

eigenvalue · 2024-09-25T13:09:59 1727269799

Wow, thanks for the great feedback! Yes it will definitely work for a 3 hour video, but just be prepared to get an incredibly long document!

hackernewds · 2024-09-18T01:06:13 1726621573

why limit this to YouTube? it should work on any body of text, is that right?

eigenvalue · 2024-09-18T02:30:11 1726626611

Yes, I'm also working on another version that is document-centric. It's a bit of a different problem. In the case of YouTube video transcripts, we are dealing with raw speech utterances. There could be run-on sentences, filler words and other speech errors, etc. Basically, it's a very far cry from a polished written document. Thus we need to really transform the underlying content to first get the optimized document, which can differ quite significantly from the raw transcript. Then we use that optimized document to generate the quizzes.

In the case of a document only workflow, we generally want to stick to what's in the document very closely, and just extract the text accurately using OCR if needed (or extract it directly in case we don't need OCR) and then reformat it into nice looking markdown-- but without changing the actual content itself, just its appearance. When we've turned the original document into nice looking markdown, we can then use this to generate the quizzes and perhaps other related outputs (e.g, Anki cards, Powerpoint-type presentation slides, etc.).

Because of that fundamental difference in approach, I decided to separate it into two different apps. But I'm planning on using much of the same UI and other backend structure. The document centric app also seems like it has a broader base of potential users (like teachers-- there are a lot of teachers out there, way more than there are YouTube content creators). I started with the YouTube app because my wife makes YouTube videos about music theory and I wanted to make something that at least she would actually want to use!

owenpalmer · 2024-09-17T20:09:35 1726603775

This approach really doesn't make sense to me. The model has to output the entire transcript token by token, instead of simply adding it to the context window...

A more interesting idea would be a browser extension that lets you open a chat window from within YouTube, letting you ask it questions about certain parts of the transcript with full context in the system prompt.

vunderba · 2024-09-17T22:42:24 1726612944

That's initially what I thought this was. Seems like somebody had the same concept, there's an extension called "AskTube" which looks like it does exactly this.

https://chromewebstore.google.com/detail/asktube-ai-youtube-...

ofou · 2024-09-17T20:47:24 1726606044

For sure, that's an interesting idea, but potentially very costly (for longer videos). A plus side of this strategy is that the Transcription gets clean up a lot and also the math notation fix up too. So, it's just a cleaner text, well formatted for people who like to read videos instead of mindlessly watching a video.

We're at Emergent Mind are working on providing bits of a technical transcript to a model and then asking follow up questions. You can check it out here http://emergentmind.com if curious.

hombre_fatal · 2024-09-17T21:34:49 1726608889

Until I read other comments here, I assumed that's what they were doing since it bugged out on me and didn't regurgitate the transcript back to me yet still let me ask questions about it.

https://chatgpt.com/share/66e9f5ae-8d20-8000-b3a5-7c1ba928b8...

ofou · 2024-09-18T00:50:40 1726620640

I'm not sure what happened there, but I used the same link as you, and this was the intended functionality:

https://chatgpt.com/share/66ea22ad-5d20-8009-a3b0-909c5f500a...

spuz · 2024-09-18T03:38:14 1726630694

How is it supposed to work? When I open this, I just see a prompt that says "Get the full transcription of any Youtube video, fast. Studies suggest that reading leads to better retention of complex information compared to video watching. Only English videos currently."

I tried pasting the URL of a YouTube video and I get the message "I'm unable to access the video directly, as the tool needed for that is disabled. However, if you'd like, you can summarize the video or let me know how I can assist with it!"

ofou · 2024-09-18T03:55:55 1726631755

Can you share the link?

This is what I'm getting: https://chatgpt.com/share/66ea4f36-90b4-8009-8b6c-02bc26cff9...

spuz · 2024-09-18T11:04:43 1726657483

Here you go: https://chatgpt.com/share/66eab3c0-6598-8011-980d-ce6d843e95...

ofou · 2024-09-18T14:02:28 1726668148

Shazam! https://chatgpt.com/share/66eadd8c-c248-8009-b07a-3ee2dfeade... seems to be working for me

romseb · 2024-09-17T21:02:06 1726606926

It does not work with long form conversations like podcasts.

"I was unable to retrieve the transcript for this video due to its large size."

ofou · 2024-09-17T21:06:42 1726607202

Coming soon! Currently, it works for videos under one hour. This limitation is due to ChatGPT's context window when using Plugins. I don't know why since it should support 200k tokens... Alternatively, you can use https://textube.olivares.cl to get the full transcription for any video in English.

choilive · 2024-09-21T04:01:50 1726891310

Do you plan on open sourcing or letting us self host this tool? I would like to grab a bunch of videos but don't want to spam your server :)

mattigames · 2024-09-18T02:29:43 1726626583

And I bet it doesn't work with podcasts where any of the participants say "ignore all previous instructions, do [something else]"

ofou · 2024-09-18T04:00:25 1726632025

I found a quick video with "ignore all previous instructions, do [something else]" on YT and it still works

https://chatgpt.com/share/66ea502e-935c-8009-a9f3-5ce9173e57...

slt2021 · 2024-09-17T23:18:33 1726615113

so the only types of videos this is suitable for, it doesnt work on

ofou · 2024-09-18T04:05:42 1726632342

You can get transcripts of any length using textube.olivares.cl or the API directly. The limitation lies in the current model used by Plugins, not in the API itself.

Here's Lex 8-hour Podcast about Neuralink https://textube.olivares.cl/watch?v=Kbk9BiPhm7o&format=txt

two_handfuls · 2024-09-18T02:35:53 1726626953

I get what this is doing, but calling it "chat with a transcript" is weird. Like, documents and videos don't chat. We chat with a bot who has seen the document/video.

nwhnwh · 2024-09-18T03:10:42 1726629042

We are going to chat with all kinds of stuff soon xD

Kiro · 2024-09-18T06:19:12 1726640352

You're way too late starting that fight. "Chat with [anything]" has been an established term for a long time now.

two_handfuls · 2024-09-19T00:59:53 1726707593

In the enthusiast community, I suppose. It's not too late to adopt clearer terminology- this will be important as those things try to reach mainstream users.

Workaccount2 · 2024-09-17T20:14:59 1726604099

I don't know if everyone has access to it (might just be yt premium), but many videos have an "ask gemini about this video" button, where you can directly ask questions about the video.

ofou · 2024-09-17T20:40:47 1726605647

It might be a preview or something because I have YT premium and doesn't show up that anywhere. Can you share a video that works for that? Like this one.

https://www.youtube.com/watch?v=zjkBMFhNj_g

hombre_fatal · 2024-09-17T21:38:40 1726609120

It's only available in the Android app, but you can activate it here: https://www.youtube.com/new

vunderba · 2024-09-17T22:40:22 1726612822

Here's a video demo from about 3 months ago:

https://www.youtube.com/watch?v=fgYIFiWgBl8

It looks like its currently limited to Android phones.

adzm · 2024-09-17T20:58:13 1726606693

It is a beta feature in YouTube premium and doesn't seem to be for all videos, but it has been extremely useful in my experience. You can even ask where in a video things are discussed etc.

oefrha · 2024-09-18T02:14:13 1726625653

It’s really ironic that YouTube basically pushed videos to be at least ~ten minutes long through commercial incentives, then offers AI features to cut through that filler garbage.

Workaccount2 · 2024-09-18T14:06:16 1726668376

While this is true, the thrust of what youtube was doing was to incentivize creation of videos that are 10+ minutes because they need to be 10+ minutes, not 10+ minutes because you are trying to game the system.

hombre_fatal · 2024-09-18T13:01:46 1726664506

Well, YTPremium users don’t see those ads. They’re the only ones who get the AI tool.

nomilk · 2024-09-18T03:36:18 1726630578

I’d love this but from the yt home page and search results page. That would let me ask chatgpt if the video really contains the info its thumbnail/title suggest it does without having to leave the current browser tab.

I’ve done this by manually copy/pasting a yt transcript into chatgpt (and later streamlining it into a bash function), and it was quite effective, allowing me to dodge a couple of click bait time wasters. (videos that looked important but really were just fluffing up unimportant nonsense).

andai · 2024-09-17T21:24:07 1726608247

Very nice. I made a thing in Python which summarizes a YouTube transcript in bullet points. Never thought about asking it questions, that's a great idea!

I just run yt-dlp to fetch the transcript and shove it in the GPT prompt. (I think also have a few lines to remove the timestamps, although arguably those would be useful to keep.)

My prompt is "{transcript} Please summarize the above in bullet points"

The trick was splitting it up into overlapping chunks so it fits in the context size. (And then summarizing your summary because it ends up too long cause you had so many chunks!)

These days that's not so important, usually you can shove an entire book in! (Unless you're using a local model, which still have small context sizes, work pretty well for summarization.)

shekhargulati · 2024-09-18T04:53:41 1726635221

I also built something similar using yt-dlp and llm CLI and wrote a post about it https://shekhargulati.com/2024/07/30/building-a-youtube-vide.... Script here https://github.com/shekhargulati/llm-tools/blob/main/yt-summ...

potatoman22 · 2024-09-18T04:11:52 1726632712

Same! It's been a nifty little tool for helping me decide which videos are worth watching. https://github.com/davidhaas6/digest

HPsquared · 2024-09-17T21:27:11 1726608431

If you're going as far as using yt-dlp, why not run the audio through Whisper?

andai · 2024-09-17T21:33:54 1726608834

Interesting, I haven't used Whisper, is it cost effective? Seems to be about 36 cents per (hour long) video? How long does processing take?

kajecounterhack · 2024-09-17T21:52:13 1726609933

You can run it locally, and it's really fast. But since YouTube transcription is really good, I don't see why you'd use Whisper and get a worse transcription (unless maybe it's on videos that Google did not transcribe for whatever reason).

gs17 · 2024-09-17T22:05:57 1726610757

> But since YouTube transcription is really good

Are you sure you're looking at automatic transcripts? YouTube transcripts are bizarrely low quality if they're not provided by the creators (I've actually used my Google Pixel's live transcription to make better captions occasionally).

I just checked a video my girlfriend uploaded a week ago and the auto-transcript was still pretty messy. I've used Whisper for the same task and it's significantly better.

jokethrowaway · 2024-09-20T18:28:12 1726856892

That's crazy, months ago I compared whisper v2 transcripts with YouTube transcripts generated on my video and found them to be identical, down to the timestamps.

I know people who upload a video on YouTube unlisted just to get transcripts generation for free and then delete the video.

ofou · 2024-09-18T04:09:58 1726632598

Agreed. However, you can get great YT transcriptions using GPT-4o mini to clean them up.

HPsquared · 2024-09-18T09:09:10 1726650550

36 cents an hour is how much it costs to hire an entire GPU like an A4000. I can assure you Whisper runs much, much faster than 1x!

jokethrowaway · 2024-09-20T18:30:22 1726857022

A few derivative projects are faster than 1x, insanely-fast-whisper being the fastest I've tried.

whisper v3 large on release day was around 1x on a 4090

davidzweig · 2024-09-17T22:47:15 1726613235

The security against downloading audio from YouTube has been upped recently with 'PO tokens'.

Whisper is only a few tenths of a cent per hour transcribed if transcribing on your gpu though, at about 30x real-time on a 3080 etc. with batching.

swyx · 2024-09-18T05:26:51 1726637211

> The security against downloading audio from YouTube has been upped recently with 'PO tokens'.

do you have a source? more generally is there a community or news source for youtube "api" news like this?

davidzweig · 2024-09-18T11:16:21 1726658181

I haven't been following closely the last few weeks, but you can check the issues in this repo, for example: https://github.com/distubejs/ytdl-core

HPsquared · 2024-09-18T09:09:44 1726650584

Tbh I've not had trouble with this for personal use.

iorrus · 2024-09-17T21:16:00 1726607760

I've been using Voxscript [0] for a while, after comparing the two I think voxscript is better, gives longer more detailed summaries, TexTube just seems to give a very brief impersonal overview. Easy to try both and see which you prefer.

[0] https://chatgpt.com/g/g-g24EzkDta-voxscript

ofou · 2024-09-17T21:28:06 1726608486

TexTube is not giving summaries but the actual transcripts. Plus, mine is way faster ;)

Compare the results:

TexTube: https://chatgpt.com/share/66e9f424-32c4-8009-b761-c8a8d6fbec... VoxScript: https://chatgpt.com/share/66e9f443-31d8-8009-b396-dba11b2f5b...

iorrus · 2024-09-17T22:26:45 1726612005

Hmm it didn’t work that way for me, first I asked it to summarise a video, then I simply posted the link to the video assuming it would give the transcript, in both cases it summarised the transcript.

But if I start a new session and simply paste the link to the video it gives the transcript. I’m not sure an llm is the best solution to getting full transcripts.

ofou · 2024-09-17T22:28:06 1726612086

you should copy and paste a youtube url and that's it

yawnxyz · 2024-09-17T21:17:28 1726607848

is this better than the Youtube-generated transcript / captions you'd get from something like https://github.com/Kakulukian/youtube-transcript?

altdataseller · 2024-09-17T23:40:03 1726616403

I pasted a video link and it says “Not Found”. Absolutely not the best first impression.

ofou · 2024-09-18T00:15:58 1726618558

Can you share the link?

jonwinstanley · 2024-09-17T20:47:26 1726606046

What does it mean by chat with a transcript?

I.e. what are the kind of things I can ask and get value from?

ofou · 2024-09-17T20:53:28 1726606408

First, I would say that reading is faster than watching. Therefore, it is more time-efficient to read a YouTube video, especially if it covers technical content or interesting ideas. Additionally, you can ask follow-up questions about the content, and since it's in an OAI conversation, you can leverage the "intelligence" of the model to help you understand the parts that you find difficult. Sometimes, I watch technical YouTube videos and wish I had a written version; so here it is.

This is an interesting example, it feels different than watching the ~12min video. https://chatgpt.com/share/66e9eaff-248c-8009-9761-d848d97881...

kylebenzle · 2024-09-17T20:49:45 1726606185

Nothing, it means nothing, like most of this "AI" hype nonsense.

They copy paste text transcripts into an Llm and have it generate more text based on its training and prompt data. You can't "chat" with a text document of course.

yreg · 2024-09-17T21:55:36 1726610136

Chat with the document means chat about that document with an LLM who has “read” it.

It can be useful; it's not hype nonsense.

jonwinstanley · 2024-09-17T22:10:53 1726611053

Ahh ok.

So rather than watch the video or read the transcript you just ask the one thing you want to know.

Could it take you to the moment in the video that is useful too?

yreg · 2024-09-18T02:08:56 1726625336

You could ask it for a couple of verbatim sentences from the transcript that are most related to what you are interested in, then find the timestamp for that text. (There could be UI for this.)

Another solution would be to skip the LLM prompting part altogether and

1. break the transcript into short sections

2. create embeddings from them and remember the timestamps for each

3. embed your query (what are you interested in)

4. calculate the closest embedding in the transcript to your query

5. return the original timestamp

ofou · 2024-09-18T03:22:24 1726629744

That's a good idea. However, I believe the challenging part lies in first reconstructing the short utterances into coherent, meaningful paragraphs.

Currently, with the API [1], you can retrieve a JSON with timestamps. The main issue, though, is how to parse the text effectively into meaningful sentences, and then add the timestamps at the beginning of the paragraph. WIP.

[1]: https://textube.olivares.cl/watch?v=9iqn1HhFJ6c&format=JSON

camus_absurd · 2024-09-17T21:34:38 1726608878

I’m not sure I follow. Can you explain ‘you can’t chat with a text document’ because you clearly can.

hombre_fatal · 2024-09-17T21:41:20 1726609280

Is anyone even chomping at the bit to hear a pedant explain how "chatting with a text document" isn't the most precise way to phrase this concept that we all understand?

ipaddr · 2024-09-18T04:40:04 1726634404

chatting with a bot about a text document.

chatting about a text document

Chatting with a text document implies it has AI or magical abilities.

You wouldn't say you are chatting with your dog if you are talking to your wife about your dog.

ivewonyoung · 2024-09-18T01:28:15 1726622895

IRC is just multiplayer Notepad.

tsunamifury · 2024-09-17T21:11:09 1726607469

allofus.ai already congregates all of the thinking of any creator on YouTube into a single mental model and allows you to interact with their synthetic self.

CamperBob2 · 2024-09-17T21:40:36 1726609236

Now that does sound intriguing, but it just leads to a blank page...?

slt2021 · 2024-09-17T23:25:24 1726615524

it is purely synthetic interaction though.

asking questions to transcript at least is ground based on something real (a video)

tsunamifury · 2024-09-18T20:22:38 1726690958

This is trained on their closed captions

afro88 · 2024-09-17T20:44:08 1726605848

When I try it it just says "Not found"

ofou · 2024-09-17T20:47:40 1726606060

Can you share the link?

afro88 · 2024-09-18T07:32:13 1726644733

I clicked on one of the examples, which was "State of GPT by Andrej Karpathy"

ofou · 2024-09-18T14:05:03 1726668303

Sometimes, the model used by Plugins gets confused, especially when the transcript is too long. It might just load the content into memory as a response without saying much more. You can then engage in follow-up chat interactions. But now I just tried again the link and it seems to work. Sometimes you have to try a bunch of times, or explicitly ask for the transcript if not shown.

https://chatgpt.com/share/66eadbad-1d3c-8009-91f0-abe3cf4d36...

joanfihu · 2024-09-20T12:27:24 1726835244

Does it not break YT ToS?

jerjerjer · 2024-09-18T00:27:19 1726619239

The most interesting thing about this is that OpenAI apparently does not own chatgpt.com domain.

alexeichemenda · 2024-09-18T00:35:19 1726619719

They do, this URL just links to a custom GPT hosted on OpenAI's chatgpt URL.

lupusreal · 2024-09-17T21:33:58 1726608838

Seems like fishing with hand grenades to me. I just download the subs and grep that.

mdp2021 · 2024-09-17T22:26:24 1726611984

Even just experience with `man`-pages, "/<term>", show that it is a suboptimal strategy that leaves querying an understanding reader engine to be desired.

lupusreal · 2024-09-17T22:39:26 1726612766

Really? I generally have a good experience with searching manpages. My big grip with those is the man program itself.

mdp2021 · 2024-09-19T09:07:11 1726736831

Mine is that directly asking a question ("How to...") would be much faster than finding the information through grep or highlight aided skimming. It would be just more efficient.

Also since in order to find a feature through a literal string you first have to guess it... But language is inherently fuzzier, so literal searches are in this purpose weaker than an interface dealing with the fuzzy aspect of expression.

righthand · 2024-09-17T21:36:39 1726608999

Nice, hallucinate a text document about video content. Next is hallucinating a video from a text document hallucinated from a video?

hombre_fatal · 2024-09-17T21:43:03 1726609383

It uses a real transcript.

studymonkey · 2024-09-17T23:49:34 1726616974

Awesome work, OP! I really believe we’ll soon be able to get a full four-year education just from YouTube. The challenge right now is sifting through the infotainment that the algorithms tend to push.

This is actually what inspired us to create Lectura: https://lectura.xyz/

We’ve added features that promote curiosity and deeper learning, like ELI5 explanations, suggested queries based on transcripts, quizzes to track retention, and more.

If you’re interested in joining us to build out the platform, feel free to reach out at neil at lectura dot xyz

asdev · 2024-09-17T23:59:56 1726617596

shill