The result doesn't really surprise me - many operations are bound by the available bandwidth. There is even a compressor named Blosc [1] that speeds up operations by moving compressed data between memory and L1 cache and (de)compressing it there instead of moving the uncompressed data.
[1] http://blosc.pytables.org/