Amazing! I've made a similar ebooks-audiobooks aligner years ago: https://github.com/r4victor/syncabook. At that time, I chose to synthesize the text and align two audio sequences because I found texts-alignment approaches (including ML-based ones) too compute-intensive and inadequate for long texts. I see Storyteller works by aligning the texts. Could you give some view on how long it takes to sync a book?
Also, my experience was that audio and text versions are often very different (e.g. the audio having an intro missing from the text). It'd be very interesting to know how well Storyteller handles such cases. Does it require manual audio/text editing or handle the differences automatically?
Hello! syncabook is awesome, and indeed Storyteller does take "the opposite" approach when it comes to forced alignment.
Others have linked to the docs, where I go into detail about the syncing algorithm, but at a high level:
Storyteller uses Whisper to transcribe the audio to text (this is the most computationally expensive part of the process)
Then we use a Levenshtein-distance-based fuzzy search algorithm to find each chapter in the text (this is attempting to account for the difference between audio and text versions, as you said!)
Then for each chapter, we find the start and end timestamp of each sentence, again using a fuzzy search across the transcription.
In general, Storyteller does a pretty good job; it treats the ebook as the source of truth, which means that at the moment it sometimes misses introductory and ending pieces of the audiobook, though it's on the roadmap to have some support for explicitly triggering those when that happens.
One obvious optimization is to sample the audio file at regular intervals and transcribe only a part of the text. Then just interpolate the locations. This can speed it up by a couple of orders of magnitude.
This is true, but it really limits the ability to highlight the current sentence visually while it's being read, which is great for language learning and for reducing cognitive load. I actually spent a lot of time trying to get the timing as precise as I could to make this feel as natural as possible, and I think the effect is really nice!
And I haven't realized that you can actually see sentences highlighted as they are being read. I'd love that for Chinese (I'm learning it, so it'd help me a lot). I'll try and see if it "just works", and contribute a patch if it doesn't.
There's an open ticket for languages other than English! https://gitlab.com/smoores/storyteller/-/issues/10. If you want to take a look, please do! I don't have any contributing guidelines yet, unfortunately, but they'll probably come soon. I think Whisper does have Cantonese and Mandarin support, so it should be possible to add support for those languages, though we'll have to look into nltk support for sentence tokenization as well!
I left it to languish once I discovered the demand wasn't that great and I was spending more time making the videos than people ultimately spent watching them.
You're welcome! I'm collecting learning resources on tech topics at https://bestresourcestolearnx.com and I was looking for a book on CS like this one for a long time. Further readings are especially good – I'm going to steal some of your recommendations ;)
I'll continue to go through books, docs, courses, and other resources on programming, CS, and math topics and share the best ones on https://bestresourcestolearnx.com to help people learn more effectively.
Think of other resources besides money that you can use for promoting projects: 1) your time and energy and 2) creativity. A reliable way to get traffic is to produce valuable content such as articles and videos. Sure, they also need promotion, but you can post them to HN, reddit, etc from time to time and it will be more reliable and sustainable source of traffic than a one-time link to the project. It's a hard way but it works 100%.
Look for new platforms and communities that are not sick of things being promoted. There is an excellent observation about Mastodon now in the comments.
SEO works. I've prototyped a new project recently – a collection of learning resources on IT topics (https://bestresourcestolearnx.com) – I haven't even started to promote it yet but it gets some relevant SEO traffic anyway.
Creative promotion ideas are project-specific, but I would think about how you can collaborate with people with some audience in your field. Like game developers giving away their games to streamers and influencers.
And mention your project when the opportunity arises (if appropriate, of course).
As far as I understand, Ruby's Ractors are what Python is now trying to do with subinterpreters (https://lwn.net/Articles/820424/). I mention them in the post.
I just skimmed through your link. Subinterpreters have been in the C api for a long time and never got around the GIL, and reading over the post that you linked — nothing has really changed.
Hi! I'm the author of this article. Thanks for posting it.
The GIL is an old topic, but I was surprised to learn recently that it's much more nuanced than I thought. So, here is my little research.
This article is a part of my series that dives deep into various aspects of the Python language and the CPython interpreter. The topics include: the VM; the compiler; the implementation of built-in types; the import system; async/await. Check out the series if you liked this post: https://tenthousandmeters.com/tag/python-behind-the-scenes
In the opening paragraph you state that the GIL prevents speeding up CPU-intensive code by distributing the work among multiple threads.
My understanding is that distributing work across multiple threads would not speed up CPU-intensive code anyways. In fact it would add overhead due to threading.
There are two things you should consider here, wall clock time, and cpu time. Making code faster using multiple threads will increase CPU time by some amount, but because that work is now distributed between several cores it should actually reduce wall clock time.
There are many CPU bound tasks which can be made multithreaded and faster, but it does depend on the task and how much extra coordination you’re adding to make it multithreaded.
The author is speaking about the general concept of threading, outside of Python (without using C extensions to help out as discussed in the article). In general, if you don't have a GIL, and you have 2 or more cores then if you run additional threads you will see a speedup for CPU-intensive code. The actual speedup will vary. A sibling comment mentions embarrassingly parallel problems, those are things like ray tracing, where each computation is independent of all the others. In those cases, you get near linear speedup with each additional core and thread. If there is more coordination between the threads (mutexes and semaphores, for instance, controlling access to a shared datum or resource) then you will get a less-than-linear speedup. And if there is too much contention for a common resource, you won't get any speedup and will see some slowdown due to the overhead introduced by threads.
If it was being distributed amongst python threads (which run on one hardware thread), then CPU performance can't improve since they're just taking turns using the CPU. If you're running on multiple hardware threads (what I assume the author meant), that can causes better CPU performance since it will distribute work across real threads that can run in parallel.
The GIL restricts use of multiple hardware threads.
You can very well parallelise CPU-intensive problems. Look at e.g. "Embarassingly parallel" on Wikipedia.
Intuitively if you can divide your work into chunks that are large enough, the scheduling overhead becomes negligible.
what are you talking about? worker thread pools are the most common way to take advantage multiple cores. Typically can see speedups (for highly parallel codes) of nX for n cores.
Sure, but I think if you are discussing this type of thing in the context of python you have to use the threads/processes terminology to avoid confusion.
The reason why this is true in Python is the GIL. In other languages without a GIL, multiple threads will run on multiple cores and can speed up CPU bound code.
Hi! I'm the author of this post. I've been writing asynchronous Python code with async/await for quite a while but didn't have a perfect understanding of how it actually works: what await does; what an event loop really is and how it runs coroutines; what coroutines are; why Python has native coroutines as well as generator-based coroutines; how asyncio works; and so forth... In this post I've tried to answer all these questions. After reading it, you should be able to reason about async/await code almost as easily as you reason about regular Python code.
Hi! I'm the author of this article. Thanks for posting it.
I've been programming in Python for quite a while but didn't really understand how the import system works: what modules and packages are exactly; what relative imports are relative to; what's in sys.path and so on. My goal with this post was to answer all these questions.
Anyway, impressive scale for Swarm! Are there any crucial features you miss in Swarm?