Hacker News new | past | comments | ask | show | jobs | submit login

Unfortunately userspace (e.g. VAAPI, ffmpeg) support for this is not done. Until VAAPI support is implemented, videos in Firefox will be unaccelerated. I think it is the same deal for Chrome.



HW acceleration on Linux was fixed about a year ago https://9to5linux.com/firefox-81-enters-beta-gpu-acceleratio...


Firefox uses VA-API. That library does not support this hardware.

Edit: as explained below, the linked work is for specific ARM hardware like the rk3399 SoC.


> Until VAAPI support is implemented, videos in Firefox will be unaccelerated. I think it is the same deal for Chrome.

That is an issue of Firefox, other software (mplayer, mpv) support VAAPI for many years. And with youtube-dl integration in mpv, why even play videos in Firefox?


Nope not an issue in Firefox, an issue in VAAPI. Firefox supports VAAPI just fine, VAAPI does not support this hardware/API. Considering it is a new API, I am still holding out hope that support gets added.


https://github.com/noneucat/libva-v4l2-request#branch=fix-ke...

And there's probably some branch somewhere that supports VP9 too.


This solution isn't great, there is direct support for the V4L2 codec API inside ffmpeg. So vaapi is not useful in this case.


Wouldn't the gstreamer support that is mentioned by the path-description directly enable hardware acceleration in Firefox? Or do I misunderstand to what extend Firefox is using gstreamer at the moment?


Firefox does not use gstreamer at all AFAIK.


Ah I confused it with ffmpeg which is also vaapi of course.

Also gstreamer somehow was added for something but that was 7 years ago, I guess getting video decoding running was different story

https://wiki.mozilla.org/index.php?title=Special:Search&limi...


The usefulness of hardware acceleration for video decoding is highly debatable.

1) It's not always much more energy efficient, but it sometimes is, but less than you'd think, GPUs need power too

2) It increases greatly the complexity of client software that has to implement both accelerated and unaccelerated decoding, leading to poorer software quality

3) Drivers quality is usually terrible: lists of working hardware/software combinations have to be maintained and in some cases, holes in sandboxes have to be punched [1]

4) HW support usually lags behind state of the art encoding. Youtube is already using av1, but the vast majority of devices won't support it in hardware before something else comes up

5) Highly optimised decoders, such as dav1d, are extremely effective and save bandwidth and power compared to HW VP9.

EDIT: I'm mostly talking about the desktop/laptop use case here were things are very fragmented. On a mobile phone where manufacturers control hardware and software end to end, that's a different story.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1698778


> The usefulness of hardware acceleration for video decoding is highly debatable.

Disagree. On low-end hardware the advantages are clear. On my older Intel NUC i can play 1080p H.264 (using mpv) hw-accelerated with 15% cpu load, or software decoded with 75% cpu load. In the first case the NUC is silent, in the second case core temperature is rising and eventually its fan starts spinning.


> On my older Intel NUC i can play 1080p H.264 (using mpv) hw-accelerated with 15% cpu load, or software decoded with 75% cpu load

These numbers are meaningless without measuring watt-hours used for the task.

I was able to play 1080p H.264 video with hardware acceleration on a 8800 GS with an Athlon X2 5000 with about the same CPU utilization, back in 2008-2009. There was a special library (shareware) that enabled HW acceleration way before it was commonplace on integrated GPUs. Forgot what it was called, but it was Nvidia/CUDA only.

That was 12+ years ago.

Obviously GPUs have become more efficient since then, but so have the CPUs. It also matters how the video stream was encoded for efficiency. It's entirely possible that under certain options, hardware decoding's advantages are almost entirely negated.


Also there's "levels" of hardware acceleration - using CUDA (or any other shader-level acceleration) will always be less efficient than a dedicated hardware block.

And there's multiple steps in decoding a video - some steps in some codecs may fit different acceleration schemes better, so it may not be worth the hardware cost for a full pipeline decode at some point, but then later transistors are cheaper, or new hw decode techniques discovered, so more steps can be done in dedicated hardware blocks. Also those hardware blocks may have hard limits - if it can only (say) cope with 1080p60 at a certain profile level for a codec, trying to do something more than that will likely just completely skip the HW block - it's hard to do any kind of "hybrid" decode if it's not a whole pipeline step.

"HW Video Decode Acceleration" isn't a simple boolean.


Hybrid decoders that use GPU shaders are somewhat rare; HW decoding pretty much always means "ASIC". And ASIC power draw for decoders is typically in the <1W range.

For dav1d, even YouTube-tier 1080p SW decoding is using +4-5W on my laptop, and 4k60 is +15-20W.


> ASIC power draw for decoders is typically in the <1W range.

Many times even "standalone" HW decoders use or share GPU components (e.g., almost always the memory). Just bumping the memory controller clock up of the GPU already consumes >10W on my system.


it's hard for me to imagine that video decoding would need a significant bandwidth boost like this to run. that seems like either a driver or hardware issue, and one that ought be solveable. 4k60 is 12Gbps. even inflating that number a bunch, it's hard to imagine most discrete graphics card memories needing more than their base-clocks to serve this.

on mobile at least, where graphics are integrated & using main memory, there ought be little/no difference in memory throughput use.

last, some new GPU's like AMD's Navi (RX6xxx) have on-package caches, "Infinity Cache", between i think 64-128MB. i want to think think could be used like Intel's Crystal Well L4 eDRAM, to keep from needing to go to main memory at all. how much if any of a win that is & whether that would even be possible i'm not sure.

i'm somewhat skeptical that there really is a problem here. if there is, i suspect it's somewhat rare & probably a bit of an oversight. i should test though. i would love get a wider picture of what the real impacts of video decoding are.


>HW decoding pretty much always means "ASIC"

Indeed. For example, hardware decoding is the difference between choppy video and smooth video on the PinePhone because the CPU isn't powerful enough and the GPU is useless for decoding.

(And to fguerraz's edit that their comment doesn't apply to mobile phones "where manufacturers control hardware and software end to end", the manufacturer does not control the software on the PinePhone.)


Yes, again, I'm talking about PCs here, where it's usually implemented in shaders.


No it isn't. "NVDEC" is an actual ASIC block in the GPU silicon. It's not "shaders". Same with AMD's VCN. And Intel's QuickSync.

If it was just shaders then there'd be basically no concerns with driver quality or hardware support, just like there aren't with CPU decoders.


To be exact, it depends on the generation of hardware. At leat for Intel and AMD, the first version tend to have more shaders, then they switch to ASICs. Intel actually open sourced the shaders that they use.


So was I? Which phone can even achieve a 20W power draw...

The only hybrid VP9 decoders were AMD's that only supported Windows, which they stopped shipping years ago (any current/Linux AMD drivers that support VP9 decoding only do so via an ASIC), and Intel's that was only supported on 3 generations of GPUs (Gen7.5, Gen8, and Gen9) and is obsoleted with an ASIC in Gen9.5.


But the email is about ARM SoCs with dedicated VPU IP blocks.


>"2) It increases greatly the complexity of client software that has to implement both accelerated and unaccelerated decoding, leading to poorer software quality"

I happen to have my own product having just that - software and hw accelerated decoding. It plays videos in few resolutions and presence of HW acceleration allowed me to play 4K videos (first on the market in my segment) with close to 0% CPU consumption on low end PCs. Competitors at that stage would not even dream about offering 4K content.

As to "poorer software quality" - please do not play FUD. I just looked at the source code - the HW accelerated path (decodes from source to DirectX texture) added miniscule 1200 lines of code good chunk of which are headers / declarations. The software is being used by tens of thousands of clients and I have about zero reports where enabling HW decoding has lead to error.


This is kernel API for VPUs not for GPUs.

Power reduction is not really questionable. You can't really achieve smooth playback at full-res without VPU on devices where these things are used.


"It increases greatly the complexity of client software that has to implement both accelerated and unaccelerated decoding, leading to poorer software quality"

Software without functionality is really simple! Same argument applies to supporting unicode, both text directions, high-dpi scaling, catering to visually impaired or having any sound more complex than midi.


> 2) It increases greatly the complexity of client software that has to implement both accelerated and unaccelerated decoding, leading to poorer software quality

Only if you are not using any abstraction layers. GStreamer should take care of using a hardware decoder if available, otherwise fall back to software decoding.


> The usefulness of hardware acceleration for video decoding is highly debatable.

No it isn't. There's a reason it's used on 99% of consumer devices. Hardware companies are generally not in the business of adding to the BOM cost for no reason. Linux alone is the outlier.

> It's not always much more energy efficient, but it sometimes is, but less than you'd think, GPUs need power too

"As you can see a GPU enabled VLC is 70% more energy efficient than using the CPU!"

https://devblogs.microsoft.com/sustainable-software/vlc-ener...

chrome-hw showing 1/4th the power consumption of chrome-sw on the same video on more recent Apple M1: https://singhkays.com/blog/apple-silicon-m1-video-power-cons...

Also hardware decoders have consistent performance, which is not true of CPU-based decoders. This is especially problematic & obvious at high resolutions. Windows & MacOS ultrabooks can do 4k video all day long without an issue. Linux ultrabooks get noticeably choppy at 1440p and 4k is right out.

This is also why you'll find ultra-low end SoCs regularly prioritizing hardware decoders over faster CPUs, notably those in every smart TV & the majority of TV streaming dongles/sticks/boxes. Which really shouldn't be surprising, fixed-function hardware has always been drastically more efficient than programmable hardware, and video has changed nothing about that.

> 2) It increases greatly the complexity of client software that has to implement both accelerated and unaccelerated decoding, leading to poorer software quality

Sounds like a job for a library, which is how every other OS makes this a non-issue.

> 4) HW support usually lags behind state of the art encoding. Youtube is already using av1, but the vast majority of devices won't support it in hardware before something else comes up

Youtube also still uses VP9 so that power efficiency didn't regress on existing hardware, and mid-tier TV SoCs with AV1 decoder support are already here (such as the Amlogic S905X4). Sony's 2021 BRAVIA XR line also has HW AV1 decoders up to 4k.

> 5) Highly optimised decoders, such as dav1d, are extremely effective and save bandwidth and power compared to HW VP9.

Care to back that up with a source? All I can find is statements that dav1d decoders are fast, but I can't find any evidence they are efficient. The only thing I can find is this: https://visionular.com/en/av1-encoder-optimization-from-the-...

which has dav1d using more power than ffmpeg-h264 but less than openhevc, but those are also software decoders which similar to the above take significantly more power than hardware decoders for the same codecs.


Disagree.

I can run multiple 1080p twitch streams with mpv using streamlink and setting appropriate decoder flags while using chromium to watch even one stream puts a lot of strain on my laptop and gets fan running immediately.

So from my perspective it is very usefull to offload video decoding to gpu and leave cpu cycles for other work. Is it more energy efficient? Never checked that but gpu fan does not really spin any faster and looking at the temperature graphs it does not seem it really strains it.

I tried enabling gpu acceleration for browser (chromium based) and I still don't really know why it is so flaky and unreliable.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: