How misaligning data can increase performance by reducing cache misses (2014)

jepler · on July 20, 2017

I only wish that at $DAY_JOB I ever encountered a performance problem that was as subtle as "oh you're not properly using the L2 cache". It's more like, "why did you choose an N^3 algorithm when you can do it in N log N" or "why is your design to write the same file on disk 1,000 times to perform one operation?" sigh

flamedoge · on July 20, 2017

I wouldn't wish that on anyone to be generous. After long, hard battle, only those with perverse perseverance may find such solution and only a few can find joy in doing it.

redcalx · on July 20, 2017

Funnily enough I'm only here because I'm waiting for just such an operation to complete. Yeh I'm on here a lot it seems. <sigh>

dgwynne · on July 20, 2017

This is also discussed in section 4 of Jeff Bonwick s 1994 paper "The Slab Allocator: An Object-Caching Kernel Memory Allocator".

twoodfin · on July 20, 2017

bcantrill (who worked with Bonwick @ Sun) is always touting that paper as a classic of systems research. I finally gave in and bought a copy of the proceedings from Usenix where it was presented. Probably the best paper in a first-rate class of them.

bcantrill · on July 22, 2017

And if you really want to see me tout it, check out my Systems We Love talk on the slab allocator![1] ;)

[1] https://systemswe.love/videos/down-memory-lane

c517402 · on July 20, 2017

Old Cray computers had the best memory bandwidth when the fast dimension of an array was of length 64*n +- 1.

dang · on July 19, 2017

Discussed at the time: https://news.ycombinator.com/item?id=7004927.

ma2rten · on July 20, 2017

It would actually be a nice feature to have in HN, if it gave you previous submissions automatically.

kornish · on July 20, 2017

There's a button which does that: "past", up right under the post link.

infogulch · on July 20, 2017

Oh! I don't know why I never noticed that.

You know what would make it better is if it added the number of past submissions just like total comments.

2 past

5 comments

make3 · on July 20, 2017

Like "previous discussions" on reddit

CyberDildonics · on July 20, 2017

This whole article is some sort of nonsense straw man. Of course if you space small amounts of data spaced out so that they are all far apart you will have bad performance. When people talk about 'alignment' they are talking about 32, 64 or 128 bit for atomics and 128, 256, and 512 bit for SIMD. Putting 8 bytes at the beginning of a 4096 byte section and the next 8 bytes right next to it isn't 'unaligned access'.

Two cache lines of 64 bytes each are cached on even a single byte access, so of course it will be faster to pack data together, but who thought differently? It also saves TLB cache misses as well and if accessed linearly will be prefetched.

db48x · on July 29, 2017

This isn't about scanning a table where the elements are allocated contiguously, it's about scanning a linked list where the elements could be anywhere; the next element is almost never going to already be in the cache. That's already bad enough (a cache miss on every step down the list), but those items are all page-alined (they are all in the same position relative to the 4k memory page that they are in), then they all have a lot of memory address bits in common and end up evicting each other out of the cache, slowing your scans down. If you only scan the list once then it's no big deal, but if this is some frequently-accessed kernel data structure (list of processes or threads or GDI objects or what have you), then it's a big problem.

One fix is to ensure that these objects are stored at locations that don't share so many common bits, or for the CPU to actually have a hash function that it uses on the address, rather than using the address as-is.

n4r9 · on July 20, 2017

This must be somewhat related an idea I read about recently. Indexing a binary heap other than the natural way can avoid cache misses and increase priority queue performance: http://playfulprogramming.blogspot.co.uk/2015/08/cache-optim...