I've always thought of immutability as great for situations where you want to "explore" (clone complex current state and go do some "what if"), need internal transactions, or allow time travel (snapshot/undo/redo) - situations where the state sharing is both efficient and feels "natural".
Also if the data/history are relatively small compared to the available memory it's a fine default that generally leads to "nicer" code.
Video doesn't seem at first glance like such a great fit.
Video can be played backwards and forwards, can be sought, or jumped to a particular point in time. It fits well with the "time travel" benefit of immutable data structure. The author mentioned it was extremely easy to implement some kind of a checkpoint or key framing with immutability. That's exactly using immutability to its strength.
The downside of immutability is the sheer volume of data (in the form of pixels) that needs to be pushed around. A single 4k frame is 8.3 million pixels, so you are looking at over 30MiB of data for 32-bit color, and you gotta push 30, maybe 60 of those a second. Maybe if you have a really good garbage collector (or a custom one, because frames are all the same size) you can get away with allocating that much data and freeing it every second. But that doesn't free you from the fact that you are not utilizing hardware caches well; you don't get good spatial locality at the hardware level unless you reuse the same physical pages of memory for every frame. And you can basically only do that if you have a mutable design.
This comment is completely off: obviously asciinema didn't store each individual pixel in its data structure. That would be a completely stupid thing to do even for a regular data structure instead of an immutable one…
That's not how asciinema-player works though. The player internally represents the terminal buffer as a grid of characters. So for 80x24 terminal you have 80*24=1920 grid cells, each keeping a unicode char + color attrs. When rendering the adjecent cells of each line are grouped by their common color attrs, resulting in (usually) a small number of span elements with text and proper style/class. You can see this in action by going to asciinema.org, opening a random recording, pausing it, then inspecting the terminal with browser's DOM inspector.
Sure, if you don't break it down to individual pixels, the data is way less. Ultimately, getting a well-performing GC is finding enough idle/spare/background cycles to scan memory and recycle it at a greater rate than allocation. If the GC falls behind then inevitably you are going to end up with a big pause. I don't think there's enough memory bandwidth to decode 4k video the naive way, but a small terminal will probably be OK. That said, it's still less efficient than just poking the bytes in memory.
Also if the data/history are relatively small compared to the available memory it's a fine default that generally leads to "nicer" code.
Video doesn't seem at first glance like such a great fit.