You point it. The algorithm I implemented is memory intensive and I used the texture map storage. It is the backprojection of tomographic reconstruction. Computation is very light.
I expect that video encoding has the same pattern. One of the critical aspect to benefit from GPU parallelism is the amount of state information each thread has to maintain. It has to be kept to a minimum because this space is limited. If they need more space the number of active threads is reduced to match the requirement.
I expect that video encoding has the same pattern. One of the critical aspect to benefit from GPU parallelism is the amount of state information each thread has to maintain. It has to be kept to a minimum because this space is limited. If they need more space the number of active threads is reduced to match the requirement.