The article mentions two: unpacking 24 bit values and matrix multiplication, both might be something you'd find in a software video (de)compressor or a computer game
Games, at least what I've seen, usually just don't use 3x(8|16|32) bits per pixel, if it's not needed as alpha. It's pretty rare to pack 24/48/96-bit pixels in memory. Traditionally, unaligned loads are just not worth it.
Sometimes packed formats were worth it. I remember doing RLE schemes for fast "alpha blending" (anti-aliased blitting really!) back in 2000 or so, before MMX or GPUs were common. The scheme I used had three different types of run lengths. Completely transparent, pre-multiplied alpha blended and completely opaque. 16-bit pre-multiplied values were in memory in 32-bit format like this: URGBU00A. Groups of 5 bits, except U denotes unused single bit. Frame buffer in RAM was in 15 bit format, 0RRRRRGGGGGBBBBB. With this scheme, alpha-blending needed totally just one multiply per pixel [1], compared to three with usual pre-multiplied alpha or six with normal alpha without pre-multiplication.
[1]: Frame buffer pixels were masked and shifted into 000000GGGGG000000RRRRR00000BBBBB, multiplied by A and shifted back in place. Because all values were 5 bit entities, no overflow (overlap) could occur in multiply, thus one multiply yielded 3 multiplied frame buffer pixel values, for R, G and B!