Hacker News new | past | comments | ask | show | jobs | submit login

all ARM-designed cores are also 64-bytes. It's not just an x86 thing



Some Cortex-A53s have 16-byte cachelines, which I found out the hard way recently.


The Cortex A9 had 32 byte cache lines for one prominent counterexample.

But my point was more that the size is baked into x86 in a pretty deep way these days. You'd be looking at new releases from all software that cares about such things on x86 to support a different cache line size without major perf regressions. So all of the major kernels, probably the JVM and CLR, game engines (and good luck there).

IMO Intel should stick a "please query the size of the cache line if you care about it's length" clause into APX, to push code today to stop #defining CACHE_LINE_SIZE (64) on x86.


> IMO Intel should stick a "please query the size of the cache line if you care about it's length" clause into APX, to push code today to stop #defining CACHE_LINE_SIZE (64) on x86.

CPUID EAX=1, bits 8-15 (i.e., second byte) of EBX in the result tell you the cache line size. It's been there since Pentium 4, apparently.

You can also get line size for each cache level with CPUID EAX=4, along with the set-associativity and other low-level cache parameters.


> IMO Intel should stick a "please query the size of the cache line if you care about it's length" clause into APX, to push code today to stop #defining CACHE_LINE_SIZE (64) on x86.

This doesn't help.

The issue with increasing cache line size is false sharing. And false sharing isn't something that's only dealt with by people who know or care about cache line width. The problem is that mostly, people just write oblivious code, then test it, and if it is slow, possibly do something about it. A way to query cache line width gets the infinitesimally small portion of cases where someone actually consciously padded structs to cache line width, and misses all the cases where things just happen to not fit on the same line with 64B lines and do fit on the same line with 128B lines.

Like it or not, coherency line width = 64B is just a part of the living x86 spec, and won't be changed. If possible, we'd probably wish for it to be larger. But at least we were lucky not to be stuck with something smaller.


> The Cortex A9 had 32 byte cache lines for one prominent counterexample.

Ok, all arm-designed cores for the last 15 years then :)


From a Raspberry Pi 5: L2 cache line is 128

  Vendor ID:              ARM
  Model name:             Cortex-A76

  $ lscpu -C
  NAME ONE-SIZE ALL-SIZE WAYS TYPE        LEVEL SETS PHY-LINE COHERENCY-SIZE
  L1d       64K     256K    4 Data            1  256                      64
  L1i       64K     256K    4 Instruction     1  256                      64
  L2       512K       2M    4 Unified         2 1024                     128
  L3         2M       2M   16 Unified         3 2048                      64




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: