In the case of pixels hidden behind views that do not use blending, overdraw only eats a very small amount of memory bandwidth checking the depth/stencil buffer for each covered pixel. This check is highly optimized in the hardware to reject whole blocks of pixels while reading only a few bits. (Android UI does use the depth/stencil buffer, right???) However, I don't think that's what the article is talking about. "You can see that the transparent pixels of the bitmaps count against your overdraw."
In the case of visible views that do use blending, overdraw multiplies the time spent running shader computation right along side multiplying the full memory bandwidth consumption of the shader (much more than just checking the depth/stencil buffer). It's true that it's totally possible to write slow shaders that chug with only 1x overdraw. But, at 3x overdraw it will be 3x as bad because you are running the whole function 3x per pixel.
Android does not use the depth buffer. The UI toolkit draws back to front. We are thinking about ways to improve this but most apps draw blended primitives back to front. An optimization I want to get in is to resort the render tree to batch commands by type and state. A side effect of this will be the ability to cull invisible primitives.
The stencil is not used at the moment (well... that's actually how overdraw debugging is implemented) because the hardware renderer only support rectangular clipping regions and thus relies on the scissor instead. Given how the original 2D API was designed, using the stencil buffer for clipping could eat up quite a bit of bandwidth or require a rather complex implementation.
It is planned to start using the stencil buffer to support non-rectangular clipping regions but this will have a cost.
Remember that the GPU rendering pipeline was written for an API that was never designed to run on the GPU and some obvious optimizations applied to traditional rendering engines do not necessarily apply.
That's actually what I expected, but I couldn't find any reference. So, I defaulted to the optimistic, but probably wrong stance hoping that someone would correct me. Thanks!
This means that, at least on traditional forward rendering GPUs (Nvidia, Adreno), overdraw is full cost even for pixels covered by opaque views. Do the PowerVR chips still get effectively-zero opaque overdraw from their tile-based-deferred-rendering approach?
In the case of visible views that do use blending, overdraw multiplies the time spent running shader computation right along side multiplying the full memory bandwidth consumption of the shader (much more than just checking the depth/stencil buffer). It's true that it's totally possible to write slow shaders that chug with only 1x overdraw. But, at 3x overdraw it will be 3x as bad because you are running the whole function 3x per pixel.