Simply beyond ridiculous (H265 proposal testing)

Splines · on May 7, 2010

Just curious - did the same shenanigans get pulled with H.264?

DarkShikari · on May 7, 2010

Not to this extreme, as far as I can tell, though I wasn't in the industry during the period that got standardized. You can actually go and try out some of the proposals; one, for example, was called Nokia MVC (not to be confused with the much later H.264 MVC); see http://wiki.multimedia.cx/index.php?title=Nokia_MVC for more information. While it certainly was a kitchen-sink of "interesting" features (affine motion compensation, KLT, etc), it was not so slow as to be impossible to use.

But I can't say for sure that there weren't some totally insane encoders put forward by those involved.

What I suspect is going on is a case of exponents. That is, you have an algorithm that tries every possible A, an algorithm that tries every possible B, and an algorithm that tries every possible C. You optimize them independently, and then in the final encoder you submit, you run them all at the same time, optimizing every possible A, for every possible B, for every possible C, sending runtime sky-high.

I'd read the code, but it's nearly-undocumented C++, written by the kind of people who reimplemented their own option parser using C strings (in C++) in Visual Studio.

chmike · on May 7, 2010

I had the impression that decoding is the time critical operation. Encoding is done once and decoding many time. Sure, it is better with fast encoding, but we won't get better compression for free.

The author didn't consider the use of GPU to perform the encoding. Kind of fishy. For my application (not video encoding) I managed to reduce the computation time from 5h to 15s (600x faster!) by using a GTX280 GPU card.

DarkShikari · on May 7, 2010

If you get a 600x speed increase, you did something wrong in your CPU application; there simply isn't 600x as much processing power on a GPU. Typical increases are about 5-10x, a bit more in applications with heavy floating-point math.

More importantly, regardless of how important you think encoding is, if I can't encode a 5 second test sequence in less than a day, I'm not going to be able to do much experimentation with the software. Sure, you can port it to a GPU, but right now it isn't on a GPU, and right now, I'd like to experiment with it.

scott_s · on May 7, 2010

I am unfamiliar with the memory patterns in the application, but: 600x improvement in performance does not have to come from increase in processing power.

If the algorithms have a lot of data reuse in their matrix computations, I can see achieving 600x improvement when compared to a hardware cache based architecture. If the CPU implementation doesn't do tiling (http://en.wikipedia.org/wiki/Loop_tiling) effectively (or it can't) then it's going to shuttle a lot of data back and forth from cache to RAM.

The accepted term for this effect is super-linear speedup.

mikeryan · on May 8, 2010

It depends on what you are doing. Generally you're right, however we're currently doing work with a group that does live PPV events, streamed over HLS in h.264 in the live case encoding is very time sensitive.

(this is a bit of a red herring - the times related in the story are pretty much useless)

Kovensky · on May 8, 2010

Quote from x264's assembly guru: <holger_> whatever this guy did, 600x faster suggests a suboptimal cpu implementation and/or a very memory intensive workload.

The GPU, looking at each individual core, is actually a very weak general purpose processor. GPGPU is good for because there are a lot of cores, so you can run highly parallelizable tasks easily in it.

Video encoding is not one of these tasks since each block depends on the previous block, each frame depends on the previous, and even on the next frame.

There have been uncountable proposals to port or accelerate a part of x264 on GPGPU. Nobody has succeeded in two years, not even for the motion search, which is supposedly the component that would be the easiest to port and benefit the most from GPGPU.

chmike · on May 10, 2010

You point it. The algorithm I implemented is memory intensive and I used the texture map storage. It is the backprojection of tomographic reconstruction. Computation is very light.

I expect that video encoding has the same pattern. One of the critical aspect to benefit from GPU parallelism is the amount of state information each thread has to maintain. It has to be kept to a minimum because this space is limited. If they need more space the number of active threads is reduced to match the requirement.

hristov · on May 7, 2010

You are right. Also computer power tends to increase pretty fast and get cheaper all the time. Also the software at this stage is likely not optimised for encoding speed. Also, that software was likely to have a lot of debugging flags, etc.

Kovensky · on May 8, 2010

> Also computer power tends to increase pretty fast and get cheaper all the time.

The spec is being written now. The encoder needs to be tested and evaluated now. There's no point in doing it now if we'll have to wait 10 years until CPUs can be used for testing whatever you make.

> Also, that software was likely to have a lot of debugging flags, etc.

It was released in source form, so those are easily disabled / removed.

MikeCapone · on May 8, 2010

> so I decided to try out the proposal that appeared the most interesting: the Samsung+BBC proposal (A124), which claims compression improvements of around 40%.

At first I read this and couldn't believe it. A 40% improvement! Wow!

But reading the rest of the article, it's disappointing to see that it's not a practical %40...

Makes me wonder if GPUs could be used to do the encode and speed it up enough to make it practical (maybe within a few generations of GPUs..). After all, you have to aim for where computers will be by the time this becomes a standard, not where they are now.

alanh · on May 8, 2010

I would like to see more information before reaching a conclusion.

Does using a GPU really help as much as chmike suggests?

What do the encoder's authors have to say?

Create · on May 8, 2010

It is a form of content protection: accessibility to hw to deal with it will be much further down the road for consumers than monopolies. Old format will be "good enough" for home video until the inevitable comes.

Would GOOG fast-track vp8 as an ISO standard, like MS did with OOXML? After all, it is ubiquitous in flash video, just as Word is -- a de facto standard needing rubber stamping.

ZeroGravitas · on May 8, 2010

I've also wondered if the needed hardware for H.264 was a form of dongle (http://en.wikipedia.org/wiki/Dongle#Copy_protection for youngsters who don't remember this early form of DRM) but have concluded it's just because they come from the hollywood and hardware world rather than being internet and software guys.

Also note that Microsoft only got to fast-track OOXML as an ISO standard because it had already been passed as a standard by ECMA, not because of anything inherent in OOXML (which wasn't even a de facto standard, the binary .doc was).

Kovensky · on May 8, 2010

VP8 is NOT ubiquitous in flash video, nor it could ever be. It was never released. Only VP6 is supported, but how many relevant people do you still see using VP6?