Hacker News new | past | comments | ask | show | jobs | submit login

At a basic level, anything with a working set on the order of 360 MB should benefit from 360 MB of combined L3 with a worst-case latency of 11.5 ns, regardless of which parts end up in which L2 slice (and the cache allocation heuristics described in the article look pretty smart to me). Similarly, if you have a total working set of a couple of GB then the 2.8 GB combined L4 at 48.5 ns latency should be great. Is there any other hardware on the market that can offer so much memory at such a low latency?



/Uneducated/ these latency numbers seem large to me. DDR5 memory sticks I browsed yesterday for a home PC listed 10ns first word latency.


If the data is not in cache, it takes quite a while longer from the time the CPU core issues a load instruction for the results to get back to the next instruction. The CPU core has to first try L1 and L2, do a TLB lookup to convert a virtual address to a physical address, send a request to L3 over an on-chip connection, then after L3 lookup fails the memory controller has to transfer a 64-byte cache line from the main memory, and the results are then sent back to the core...

Have a look at the section "Cache setup" at https://chipsandcheese.com/2024/08/14/amds-ryzen-9950x-zen-5... for some real-world latency values. Once we're talking about a 100+ MB working set (i.e. DDR5 instead of cache), a top-of-the-line Ryzen 9950X has an access latency of about 100 ns. There is also some older data for a wider variety of CPUs at https://chipsandcheese.com/memory-latency-data/ - and there the older IBM z15 is in a class of its own.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: