More

goeiedaggoeie · on Nov 6, 2023

You could already place demuxing on a different thread as well

goeiedaggoeie · on Oct 28, 2023

Error rates are still high. You quickly find that if you have latency sensitive information that you need to ship over the public internet, like live video.

dmurray · on Oct 28, 2023

Also, you can trade off bandwidth for error rate at a lower level of the networking stack, so if you know your applications will use error- correcting protocols like TCP, you can make your switches and routers talk to each other a little faster.

midasuni · on Oct 28, 2023

Depends what you mean by high, and where you’re going to.

goeiedaggoeie · on Oct 28, 2023

protocols like quic and srt (used for video) are great, forward error/erasure correction is something I would also mention as a large part of the rise of UDP over TCP based transfer/protocols.

goeiedaggoeie · on Oct 10, 2023

If your video is short enough to encode in the lambda limit it is worth considering that https://aws.amazon.com/blogs/media/processing-user-generated... mediaconvert is expensive, generally AWS video stack is expensive (MediaConvert, Elemental, Media Connect)

goeiedaggoeie · on Oct 9, 2023

Ive used xilinx a fair bit for encoding. once you get past the pain of compiling your tooling for it it does speed up VOD encode significantly.

dilyevsky · on Oct 10, 2023

How’s the quality? I heard it was so-so and i think you can’t close your own presets

goeiedaggoeie · on Oct 12, 2023

For what I have down (2k and down, main) the quality has been OK. I have also read some complaints about quality at high res, but I am a happy customer

goeiedaggoeie · on Oct 5, 2023

Same in video parsers and tooling frequently, expects a whole mp4 to be there, or a whole video to parse it, yet gstreamer/ffmpegapi delivers the content as a stream of buffers that you have to process one buffer at a time.

pipo234 · on Oct 5, 2023

Traditionally, ffmpeg would build the mp4 container while transcoded media is written to disk (in a single contiguous mdat box after ftyp) and then put the track description and samples in a moov at the end of the file. That's efficient because you can't precisely allocate the moov before you've processed the media (in one pass).

But when you would load the file into a <video> element, it would off course need to buffer the entire file to find the moov box needed to decode the the NAL units (in case of avc1).

A simple solution was then to repackage by simply moving the moov at the end of the file before the mdat (adjusting chunk offset). Back in the day, that would make your video start instantly!

goeiedaggoeie · on Oct 5, 2023

This is basically what cmaf is. the moov and ftyp gets sent at the beginning (and frequently gets written as an init segment) and then the rest of the stream is a continuous stream of moof's and mdat's chunked as per gstreamer/ffmpeg specifics.

pipo234 · on Oct 5, 2023

I was thinking progressive MP4, with sample table in the moov. But yes, cmaf and other fragmented MP4 profiles have ftyp and moov at the front, too.

Rather than putting the media in a contiguous blob, CMAF interleaves it with moofs that hold the sample byte ranges and timing. Moreover, while this interleaving allows most of the CMAF file to be progressively streamed to disk as the media is created, it has the same CATCH22 problem as the "progressive" MP4 file in that the index (sidx, in case of CMAF) cannot be written at the start of the file unless all the media it indexes has been processed.

When writing CMAF, ffmpeg will usually omit the segment index which makes fast search painful. To insert the `sidx` (after ftyp+moov but before the moof+mdat s) you need to repackage (but not re-encode).

Same problem, same solution more or less.

jrpelkonen · on Oct 5, 2023

It is possible that this is not a fault of the parser or tooling. In some cases, specifically when the video file is not targeted for streaming, the moov atom is at the end of the mp4. The moov atom is required for playback.

lazide · on Oct 5, 2023

Zip files are the same. At least it makes it easy to detect truncated files?

regularfry · on Oct 5, 2023

That's intentional, and it can be very handy. Zip files were designed so that you make an archive self-extracting. They made it so that you could strap a self-extraction binary to the front of the archive, which - rather obviously - could never have been done if the executable code followed the archive.

But the thing is that the executable can be anything, so if what you want to do is to bundle an arbitrary application plus all its resources into a single file, all you need to do is zip up the resources and append the zipfile to the compiled executable. Then at runtime the application opens its own $0 as a zipfile. It Just Works.

KMag · on Oct 5, 2023

Also, it makes it easier to append new files to an existing zip archive. No need to adjust an existing header (and potentially slide the whole archive around if the header size changes), just append the data and append a new footer.

lazide · on Oct 5, 2023

Interestingly, a useful strategy for tape too, though zip is not generally considered tape friendly.

athanagor2 · on Oct 6, 2023

This is exactly what Justine Tunney's redbean does.

dcow · on Oct 5, 2023

I’ve found the Rust ecosystem to be very good about never assuming you have enough memory for anything and usually supporting streaming styles of widget use where possible.

goeiedaggoeie · on Oct 5, 2023

ha! I was literally thinking of the libs for parsing h264/5 and mp4 in rust (so not using unsafe gstreaer/ffmpeg code) when moaning a little here. Generally i find the rust libraries and crates to be well designed around readers and writers.

prox · on Oct 5, 2023

Is it me or aren’t there a whole lot video specialists in general? It’s just something I noticed here and there on github.

nullpilot · on Oct 5, 2023

My experience that played out over the last few weeks lead me to a similar belief, somewhat. For rather uninteresting reasons I decided I wanted to create mp4 videos of an animation programmatically.

The first solution suggested when googling around is to just create all the frames, save them to disk, and then let ffmpeg do its thing from there. I would have just gone with that for a one-off task, but it's a pretty bad solution if the video is long, or high res, or both. Plus, what I really wanted was to build something more "scalable/flexible".

Maybe I didn't know the right keywords to search for, but there really didn't seem to be many options for creating frames, piping them straight to an encoder, and writing just the final video file to disk. The only one I found that seemed like it could maybe do it the way I had in mind was VidGear[1] (Python). I had figured that with the popularity of streaming, and video in general on the web, there would be so much more tooling for these sorts of things.

I ended up digging way deeper into this than I had intended, and built myself something on top of Membrane[2] (Elixir)

[1] https://abhitronix.github.io/vidgear/ [2] https://membrane.stream/

dylan604 · on Oct 5, 2023

It sounds like a misunderstanding of the MPEG concept. For an encode to be made efficiently, it needs to see more than one frame of video at a time. Sure, I-frame only encoding is possible, but it's not efficient and the result isn't really distributable. Encoding wants to see multiple frames at a time so that the P and B frames can be used. Also, to get the best bang for the bandwidth buck is to use multipass encoding. Can't do that if all of the frames don't exist yet.

You have to remember how old the technology you are trying to use is, and then consider the power of the computers available when they were made. MPEG-2 encoding used to require a dedicated expansion card because the CPUs did have decent instructions for the encoding. Now, that's all native to the CPU which makes the code base archaic.

nullpilot · on Oct 5, 2023

No doubt that my limited understanding of these technologies came with some naive expectations of what's possible and how it should work.

Looking into it, and working through it, part of my experience was a lack of resources at the level of abstraction that I was trying to work in. It felt like I was missing something, with video editors that power billion dollar industries on one end, directly embedding ffmpeg libs into your project and doing things in a way that requires full understanding of all the parts and how they fit together on the other end, and little to nothing in-between.

Putting a glorified powerpoint in an mp4 to distribute doesn't feel to me like it is the kind of task where the prerequisite knowledge includes what the difference between yuv420 and yuv422 is or what Annex B or AVC are.

My initial expectation was that there has to be some in-between solution. Before I set out, what I had thought would happen is that I `npm install` some module and then just create frames with node-canvas, stream them into this lib and get an mp4 out the other end that I can send to disk or S3 as I please.* Worrying about the nitty gritty details like how efficient it is, many frames it buffers, or how optimized the output is, would come later.

Going through this whole thing, I now wonder how Instagram/TikTok/Telegram and co. handle the initial rendering of their video stories/reels, because I doubt it's anywhere close to the process I ended up with.

* That's roughly how my setup works now, just not in JS. I'm sure it could be another 10x faster at least, if done differently, but for now it works and lets me continue with what I was trying to do in the first place.

dylan604 · on Oct 5, 2023

This sounds like "I don't know what a wheel is, but if I chisel this square to be more efficient it might work". Sometimes, it's better to not reinvent the wheel, but just use the wheel.

Pretty much everyone serving video uses DASH or HLS so that there are many versions of the encoding at different bit rates, frame sizes, and audio settings. The player determines if it can play the streams and keeps stepping down until it finds one it can use.

Edit: >Putting a glorified powerpoint in an mp4 to distribute doesn't feel to me like it is the kind of task where the prerequisite knowledge includes what the difference between yuv420 and yuv422 is or what Annex B or AVC are.

This is the beauty of using mature software. You don't need to know this any more. Encoders can now set the profile/level and bit depth to what is appropriate. I don't have the charts memorized for when to use what profile at what level. In the early days, the decoders were so immature that you absolutely needed to know the decoder's abilities to ensure a compatible encode was made. Now, the decoder is so mature and is even native to the CPU, that the only limitation is bandwidth.

Of course, all of this is strictly talking about the video/audio. Most people are totally unawares that you can put programming inside of an MP4 container that allows for interaction similar to DVD menus to jump to different videos, select different audio tracks, etc.

nullpilot · on Oct 5, 2023

> This sounds like "I don't know what a wheel is, but if I chisel this square to be more efficient it might work". Sometimes, it's better to not reinvent the wheel, but just use the wheel.

I'm not sure I can follow. This isn't specific to MP4 as far as I can tell. MP4 is what I cared about, because it's specific to my use case, but it wasn't the source of my woes. If my target had been a more adaptive or streaming friendly format, the problem would have still been to get there at all. Getting raw, code-generated bitmaps into the pipeline was the tricky part I did not find a straightforward solution for. As far as I am able to tell, settling on a different format would have left me in the exact same problem space in that regard.

The need to convert my raw bitmap from rgba to yuv420 among other things (and figuring that out first) was an implementation detail that came with the stack I chose. My surprise lies only in the fact that this was the best option I could come up with, and a simpler solution like I described (that isn't using ffmpeg-cli, manually or via spawning a process from code) wasn't readily available.

> You don't need to know this any more.

To get to the point where an encoder could take over, pick a profile, and take care of the rest was the tricky part that required me to learn what these terms meant in the first place. If you have any suggestions of how I could have gone about this in a simpler way, I would be more than happy to learn more.

dylan604 · on Oct 5, 2023

using the example of ffmpeg, you can use things like -f in front of -i to describe what the incoming format is so that your homebrew exporting can send to stdout piped to ffmpeg where reads from stdin with '-i -' but more specifically '-f bmp -i -' would expect the incoming data stream to be in the BMP format. you can select any format for the codecs installed 'ffmpeg -codecs'

londons_explore · on Oct 5, 2023

In a way, that's good. The few hundred video encoding specialists who exist in the world have, per person, had a huge impact on the world.

Compare that to web developers, who in total have had probably a larger impact on the world, but per head it is far lower.

Part of engineering is to use the fewest people possible to have the biggest benefit for the most people. Video did that well - I suspect partly by being 'hard'.

goeiedaggoeie · on Aug 23, 2023

My office is 7 minutes on my bicycle. I work from home where I have a better screen setup, privacy, instead of having to shuffle into a cubicle to make calls. I cycle to the office to have a beer with some colleagues every few weeks on an agreed date and we meet there when we have to do some deep planning. Day to day my house is so much better

HDThoreaun · on Aug 23, 2023

You can't get a good screen setup at your office?

goeiedaggoeie · on Aug 23, 2023

office is limited to two 1080 monitors at home I have 2x1080 + 4k. so much better. they won't change for one person :|

ryandrake · on Aug 23, 2023

Insanity. Willing to pay $100K+ for an engineer but not $1K for a monitor.

xboxnolifes · on Aug 23, 2023

Even crazier since 4k isn't even that expensive anymore. You can get a 60Hz IPS 4k monitor for $300-$400.

goeiedaggoeie · on Aug 23, 2023

100% agree.

goeiedaggoeie · on Aug 21, 2023

why would you want 1? having the raster frames is surely better for post production. I agree that models should take a stab at compression but I think it should be independent. At the end of the day you also don't want to be doing video compression on your GPU, using a dedicated chip for that is so much more efficient. lastly, you don't want to compress the same all the time. FOr low latency we compress with no b-frames and a smallish GOP, with VOD we have a long GOP and b-frames are great for compression.

2. as long as we can again port the algo's to dedicated hardware, which are on mobiles a must for energy efficiency for both encode and decode

goeiedaggoeie · on Aug 19, 2023

Where do we price up externalities! We surely need to in manufacturing, construction and such, but I don't see us doing it in any meaningful way now, so I am a bit surprised by your statement!

goeiedaggoeie · on Aug 18, 2023

Feels like the spirit of Scala reborn.