Hacker News new | past | comments | ask | show | jobs | submit login

The article mentions two: unpacking 24 bit values and matrix multiplication, both might be something you'd find in a software video (de)compressor or a computer game



Where did you get 24 from? It's about vectorizing loads of 3 elements. The elements don't have to be 8-bit. The second example uses floats.


Yeah, but 24-bit values (RBG) are what you'd find in video and gaming applications.


Games, at least what I've seen, usually just don't use 3x(8|16|32) bits per pixel, if it's not needed as alpha. It's pretty rare to pack 24/48/96-bit pixels in memory. Traditionally, unaligned loads are just not worth it.

Sometimes packed formats were worth it. I remember doing RLE schemes for fast "alpha blending" (anti-aliased blitting really!) back in 2000 or so, before MMX or GPUs were common. The scheme I used had three different types of run lengths. Completely transparent, pre-multiplied alpha blended and completely opaque. 16-bit pre-multiplied values were in memory in 32-bit format like this: URGBU00A. Groups of 5 bits, except U denotes unused single bit. Frame buffer in RAM was in 15 bit format, 0RRRRRGGGGGBBBBB. With this scheme, alpha-blending needed totally just one multiply per pixel [1], compared to three with usual pre-multiplied alpha or six with normal alpha without pre-multiplication.

[1]: Frame buffer pixels were masked and shifted into 000000GGGGG000000RRRRR00000BBBBB, multiplied by A and shifted back in place. Because all values were 5 bit entities, no overflow (overlap) could occur in multiply, thus one multiply yielded 3 multiplied frame buffer pixel values, for R, G and B!


Presumably loading a 3D vector would be a good use-case?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: