Interesting read, one thing I don’t understand is how much space does loop buffe...

akira2501 · 2024-11-30T22:40:54 1733006454

I think most modern chips are routing constrained and not floorspace constrained. You can build tons of features but getting them all power and normalized signals is an absolute chore.

Remnant44 · 2024-11-30T21:58:01 1733003881

My understanding is that it's a pretty small optimization on the front end. It doesn't have a lot of entries to begin with (144) so the amount of space saved is probably negligible. Theoretically, the loop buffer would let you save power or improve performance in a tight loop. In practice, it doesn't seem to do either, and AMD removed it completely for Zen 5.

atq2119 · 2024-11-30T23:53:57 1733010837

Judging from the diagrams, the loop buffer is using the same storage as the micro-op queue that's there anyway. If that is accurate (and it does seem plausible), then the area cost is just some additional control logic. I suspect the most expensive part is detecting a loop in the first place, but that's probably quite small compared to the size of the queue.

progbits · 2024-11-30T21:57:57 1733003877

It says 144 micro-op entries per core. Not sure how many bytes that is, but L2 caches these days are around 1MB per core, so assuming the loop buffer die space is mostly storage (sounds like it) then it wouldn't make a notable difference.