- Next-gen codecs provide 50% bitrate improvements over x264, but are 10-20x as slow at the top settings required to accomplish such results.
- Normalized for CPU usage, libvpx already has some selling points when compared to x264; x265 is still too slow to be useful in most practical scenarios except in very high-end scenarios.
- ffvp9 is an incredibly awesome decoder that outperforms all other decoders.
One downside of VP9: the VP9 spec is still under NDA from Google. There is no publicly available spec. As far as I know, the only people that have written a software VP9 decoder are either Google employees (libvpx) or ex-Google-employees who worked on VP9 when they were there (rbultje, who wrote ffvp9 with Clément Bœsch).
There's a public RFC, but it's by no means usable as a bitstream spec.
There's an official text document describing the bitstream format (several hundreds pages), but it's under NDA.
We use it at work, and getting an updated version is far more trouble than downloading a file on a public server.
The document in itself is nearly worthless, though, as it's far from complete. In the end, you still need to dig into libvpx to understand how things work.
Lots of people have produced decoders for vp9 including both hardware and software vendors. Including companies like ittiam, Intel, samsung and qualcomm.
Lack of a non NDA public spec may have made things more difficult for some but it certainly didn't stop everyone.
Actually, as a decoder implementer, I found that having the C source code of a reference decoder (libvpx), whose behaviour is authoritative over any text document, actually made things a lot more easy.
Because now, whenever you ask yourself "how should a VP9 decoder behave in this situation?", you have a non-ambiguous answer "do like libvpx does".
We didn't have this luxury with HEVC. The HEVC reference decoder (HM) is horribly complex, and has been easy to crash in many different ways for years. As a consequence, it could by no means be considered as authoritative. Which is a pity, because there were a lot of discrepancies between the reference decoder and the .doc spec.
> ffvp9 is an incredibly awesome decoder that outperforms all other decoders.
Not to take away from the hard work which has gone into ffvp9 but that should be “all other open-source decoders” without benchmarks comparing it against the proprietary implementations which most people actually use. ffvp9 be great but still the wrong choice if someone has a hardware/GPU-accelerated H.265 codec in their OS and would prefer longer battery life to use of free software.
Firefox added support for Intel's VP9 hardware decoder, but had to disable it because of Intel bugs and hardware performance was often worse the available software decoders!
This seems like an opportunity for hardware acceleration (CPU/GPU instructions / x265 accelerator) to reduce the power consumption on video playback for the majority of devices. Perhaps it's cheaper and simpler to make a co(+)dec chip that can encode too.
NVIDIA currently supports HEVC decoding on it's GTX 900 (and 750ti/se as they are also Maxwell) GPU's.
iPhone 6s has h265 decoding also (works with Plex without transcoding flawlessly).
Qalcomm has support for HEVC through all of it's latest generations 200 (2xx, 4xx, 6xx all) through 800 (808 onward) series of SOC's.
I have not had issues playing 1080 x265 content videos on any of my mobile devices past 18 months or so since they've started releasing them.
There's just something to be said about 200-300MB per 1080p episode and about 700MB per movie.
Can cram an entire library to most of my current mobile devices for a flight/long commute without having to worry about SD cards/USB OTG.
Am I missing something or is that article's conclusions pretty disingenuous. Then again the author did work for Google on VP9 so am not totally surprised.
Yes H.265 encoding speed is 10-20x slower (at the highest quality setting) but the decoding speed was only 20% slower than H.264. And it is still quite faster than 60fps below which almost all movie content exists.
If all that mattered in a codec was encoding speed VP9 would be dominant. But it isn't. It's largely irrelevant compared to decoding speed (majority of codec users) and device support. The latter which H.265 is in an unassailable position.
ffvp9 is indeed awesome. There was a talk at VideoLan Dev Days about the development. They outperform the reference decoder and optimized basically everything they could touch.
Interestingly, the place I've seen the most widespread adoption of HEVC is China: Android-based set-top boxes with hardware H.265 decoding are widely[1] available[2], and a number of the major IPTV services (both authorised and unauthorised) are using it, including for live content.
I was quite surprised to find I was able to stream a very high quality 1080p movie on a random unbranded Chinese IPTV box at ~2mbps that looked every bit as good as what Netflix streams at 5mbps.
Boxes with 4Kx2K H.265 hardware decoder have been around since 12/2014 with SoCs from Rockchip, Amlogic and AllWinner.
Volume prices start at $40, depends on RAM/eMMC size 1G/8G - 2G/16G. Yes, that's the price for a packaged box including power plug and lots of cables ... not just a PCB.
Has anyone used these? The one I had used an outdated fork of Android 4 and could never be updated because the hardware decoder could only be used with an OEM (binary only) video player.
Yeah, the pre-installed software is usually crap. But there are plenty of modded ROMs for the more popular boxes. And you can't beat the price as an OEM product -- just put your own software on it.
The source release policy has gotten better in the past year. Mainly due to Google (Chromebox), Hardkernel (ODroid) and others kicking the manufacturers. At least they're upstreaming their drivers now. They have to clean up ~1M lines per SoC family, that'll take some time. And userspace is not just blobs anymore. For example Amlogic regularly publishes a (working!) buildroot at http://openlinux.amlogic.com:8000/download/ . Microcode for VPU or Wifi still comes as blobs.
Mind blown - I never thought I'd see a buildroot setup like that. Although, Spreadtrum have servers like that but you'd be hard pressed to grab the sources while the servers are up...
I've used both Xiaomi's box[1] and LeTv's[2]. Both are quite nice hardware at rather amazing price points (299CNY/44USD and 290CNY/44USD, respectively). They are more or less intended to be disposable and don't get much in the way of software updates (though they work fine for their intended purpose), but there's a vibrant community that has released alternate ROMs.
I don't have much need for 4K, but have been considering re-encoding all my BR (ripped) content again in HEVC 1080p, as I do notice the difference, but the file sizes on my NAS have left me favoring 720p/h.264, but the size difference for 1080p/h.265 is minimal, but quality is noticeably better.
They don't, but they've been getting hit hard from both domestic and international rightsholders. There's also been a push against IPTV services offering foreign content/live channels on 'national security' grounds in recent months, and quite a few companies have disappeared or gone deep underground.
The big players—LeTv, Tencent—have been fairly diligent about getting proper licenses for content over the last year or so. LeTv just scored rights to the Premier League, for example.
I don't think that's the case given a number of them are being sued by both Chinese and Foreign Broadcasters and media companies.
Others are pivoting into netflix-esque licensing businesses. They are like to media companies what Uber is to taxis, they are ignoring the rules to bring a better service, and it's working.
I wonder if the choice of encoder has an impact to the results; on a quick glance they appear to use Fraunhofer reference encoder for H.264 whereas afaik x264 is the state of art wrt subjective quality.
In my unprofessional opinion 8 Mbps sounds quite low for 4k h264 encoding, I would expect that sort of rate for good quality 1080p encoding (Blu-Rays being in the 20 Mbps range afaik)
The choice of the encoder has huge impact on the results - e.g. hardware H.264 encoders in phones will hardly produce anything watcheable at 5Mbit@720p while x264 produces watchable output at 2Mbit at same resolution.
Same goes for those other encoders - they vary hugely in quality and used CPU at same bitrate and resolution.
The reference encoder is known to have quality that's quite a bit worse than x264 so this paper is probably overstating the advantage of h265 over h264
This paper is quite old and has known problems with setting a constant quantizer for VP9. The results do not match my own findings comparing libvpx and x265 at all [1]. libvpx VP9 is still worse than x265, but the difference is quite small.
We downloaded a random git commit a year before a release was made and claimed it was officially released. Best case scenario, we don't understand the difference between a spec and implementation being finalized. Worst case: intentional hatchet job.
How do the complexity requirements compare? H.265 decoders have memory for lots of previous frames, but VP9 could be at a handicap if the format limits backward references to just a few frames.
BBC apparently does not consider VP9 to be a major video coding standard.
>The High Efficiency Video Coding (HEVC) standard has been developed jointly by the two standardisation bodies ITU and ISO (as has been the practice with all major video coding standards in the past 3 decades);
And that's probably the right call for the BBC. Hopefully they'll get behind the NetVC [1] effort and support that in the future. The NetVC codec is to be built from Daala and Thor, and maybe VP10 as well. To complement NetVC there's also the Alliance for Open Media [2], but I'm a bit disappointed that the only public output from AOMedia so far seems to be a press release from September last year.
Select "Entire curve (old method)" to see BD-rate numbers (the new method requires data overlap in all three bitrate ranges in order to work, which the old curve does not have).
According to another comment[1], to achieve 50% savings, encoding time needs to be 10x to 20x longer. I suspect that in the race to be the first group to release, they use faster settings at the expense of artifacting.
I don't think there's a Scene Standard [1] for HEVC yet. Even the one for x264 is a few years old and might not represent state-of-the-art.
If you're talking about P2P groups - all bets are off. There's even major groups of regional cappers who still use XVid... as a hobbyist archivist this makes me very sad.
It's a crapshoot.. I've seen some quality encodes with HEVC, and have been opting for HEVC 1080p when available at around the same size as 720p AVC.
Most of my own rips of blueray have been 720p/x264, and I'm thinking about re-ripping HEVC for the better quality... size on my NAS has been a minor concern... but having it all available from kodi is the best.
Unfortunately the piracy scene doesn't seem to be as cohesive as it used to be - the x264 standards are years old now and I haven't even seen whispers of a HEVC standard. You really don't want joe-random-encoder running ffmpeg with some copy-pasted parameters to become the defacto standard.
This is completely offtopic, but I'd love to do some encoding tests, except that the canonical test clips seem to be from Big Buck Bunny - animation encodes differently than live-action video.
Does anyone know of some Free-as-in-speech clips from a wider range of sources/cameras that I can use without being sued into oblivion? Australia doesn't really have Fair Use provisions.
Australia doesn't have Fair Use provision because it doesn't need it.
Australia allows use of copyrighted material for research/teaching purposes (up to a % limit). As in it isn't against the law at all. Where as Fair Use Provision is a defence, when you are charged with breach of copyright (much like "self defence" in an assault charge).
(This is how even in Australia, Teachers are allowed to photocopy material for classes, and Journalists can report on other materials without being sued). The Copyright Agency tries to police and charge royalties if applicable.
Big Buck Bunny is from a number of films by the Blender Foundation. They also made the live-action movie Tears of Steel, which can be downloaded uncompressed: http://media.xiph.org/tearsofsteel/
Wasn't it supposed to be 75% savings. When HEVC first came out the claim was that 4K HEVC encoded video would have the same bitrate as 1080p AVC encoded video.
In fact quadrupling the number of pixels does not result necessarily in a 4x multiplication in bitrate, much less in general. That is because a movie recorded for example at 4K resolution is not at all the same as 4 different/independent 1080p movies that you would stick together in the same 4K frame. There are correlations/statistical properties that are exploited in the first case by the encoder and that does not exist in the second case. I have seen somewhere (though I can't find the reference) a back of the enveloppe calculation using Fourier analysis that showed that to multiply by 4 the number of pixel you only needed to multiply by 2 the bitrate. So, if HEVC has 50% bitrate savings that would explain the claim that 4K HEVC could have the same bitrate as 1080p AVC. Of course there is much more happening in HEVC than just a Discrete Cosine Transform (motion compensation etc.) so I don't know how this really applies in practice, and I haven't done the tests myself...
The 2x factor reason is pretty intuitive if you understand that video compression is still fundamentally representing superpositioned signals from 2D Fourier Analysis and that multiplying by 2x the number of pixels in each direction is no different for perception than if we doubled the DPI. Double the DPI yields up to 2x the possible frequencies needing to be represented up to the Nyquist frequency that would cover each possible interpolated pixel. This is part of why noise in film or audio makes representation much tougher - noise is typically higher frequency and rather random (although distribution depends upon brown, white, pink, etc. noise).
4x number of points in real space -> 4x points in Fourier space, the Fourier space is still 2D. I don't get your reasoning.
OTH if you increase the DPI you bring in higher frequency components that are not that important for perception, so you can compress them more heavily.
I was confused with a different concept, disregard that part. FFT is by definition reversible for a discrete signal like a quantized image so each pixel must be reversible, so it has to be 4x total space used with no further operation, correct.
Quantization and filtering are the more important parts of the encoder than the FFT / DCTs since the transform is 1:1 reversible. Compression isn't just the mathematical accuracy of a signal when it comes to lossy algorithms as you know. A 720p video upscaled to 1440p should theoretically be exactly the same size for the sake of effective quality but encoders don't care about just the math and apply perceptual filters because simply doubling pixels looks really bad perceptually it turns out.
Another side of it is that video compression, like image compression (and analogously audio compression), works largely by exploiting limitations of human vision. We can't discern high spatial frequencies as precisely as we can discern low frequencies.
It stands to reason that if you shrink those details even smaller, as is the case with 4K, you'll need even less precision for those coefficients.
I don't think a compressed video with 4 times as many pixels will be 4 times as large.
Consider you already have one pixel of each 2x2 pixel group, then you can predict very good how the other pixels are looking, so you don't need as much data anymore.
IIRC, 4k AVC is only twice the bitrate of 1080p AVC (or something like that - I know that the scaling is less than 1:1, though), which would make 4k HEVC have about the same bitrate as 1080p AVC.
For livestreaming, NVIDIA (I believe AMD as well actually) graphics cards offer NVENC, which is a hardware HEVC encoder that can easily meet most people's needs.
The problem with livestreaming using hardware encoders is that the quality is utter garbage compared to software encoders, which means that given the bitrate available to you it may in fact be so ugly to be unwatchable.
A typical home DSL connection does not have enough bitrate to produce a quality mid-to-high resolution stream using NVENC, and even if you have a good connection, Twitch caps out at 3500kbps. The gap between something like NVENC/QuickSync and x264 is enormous.
At the very least, HEVC's huge bitrate/quality advantage over H264 will help it here, but when it comes to GPU encoders they will just be attempting to catch up with what you can get right now using x264.
Live streaming quality aside, though, you are right that the hardware will make it possible for people to do live streaming in scenarios where they previously would've been unable to, because you can combine a hardware HEVC encoder with a low bitrate connection and at least get something watchable.
Do you have data to back that up? I've found that when comparing SSIM and PSNR, NVENC's H.264 encoder is able to quite handily beat x264 for quality while targeting the same bitrate and encode speed. This was using a Gen 2 Maxwell GPU and comparing to a mid range Xeon.
It's possible software is the problem here. I have certainly never seen a way to get acceptable quality out of NVENC in common consumer game streaming software like OBS, and I've tried at length including adjusting profiles and tuning settings. It doesn't help that NVENC's documentation is atrocious.
NVENC also mishandles color spaces considerably (full vs limited range), which is a big handicap to begin with.
SSIM and PSNR aren't really the question here either, the issue is totally perceptual. I do not doubt that on a speed basis NVENC is competitive or outperforms, the question is simply whether you can deliver something watchable that hits under the user's upstream cap. If you look at average twitch streams of even mundane video games, many of them are borderline unwatchable - this is a pervasive problem :-) Kicking butt on image-wide SSIM and PSNR won't be much use if all the text in a story-driven game is unreadable due to artifacts.
I've used consumer embedded hardware encoders in the past (quicksync, nvenc) using OBS, and they simply could not get enough quality at a streamable bitrate. They were however amazing to capture (near-)lossless recordings of an application at no performance cost.
I assume that dedicated hardware encoders/capture cards do not suffer from this, but it would be amazing of high quality streaming becomes available to the public without requiring dedicated hardware.
It's usually the case with new generations of software encoders.. I remember how long DVD rips would take in 2000... And remember how slow h.264 was when it became more common.
Quick Sync also isn't exactly what I'd call a high-quality encoder, it is useful for things like realtime streaming or screencast recording, but if you are encoding movies or TV shows you will want a "real" HEVC encoder.
GM107, GM204, and GM200 all use a combination of a GPU shader kernel with some CPU assistance. GM206 (in the GTX 960 and 950) implements a newer revision of the PureVideo decoder that allows H265 to be decoded directly on the video engine.
- Next-gen codecs provide 50% bitrate improvements over x264, but are 10-20x as slow at the top settings required to accomplish such results.
- Normalized for CPU usage, libvpx already has some selling points when compared to x264; x265 is still too slow to be useful in most practical scenarios except in very high-end scenarios.
- ffvp9 is an incredibly awesome decoder that outperforms all other decoders.
https://blogs.gnome.org/rbultje/2015/09/28/vp9-encodingdecod...