Given that the fixed table is a much simpler one (by letting out-of-bounds just ...

phire · 2024-12-29T05:38:18 1735450698

It feels like the kind of optimization that gets missed because the task was split between multiple people, and nobody had complete knowledge of the problem.

The person generating the table didn't realize filling the out-of-bounds with two would make for a simpler PLA. And the person squishing the table into the PLA didn't realize the zeros were "don't care" and assumed they needed to be preserved.

It's also possible they simply stopped optimizing as soon as they felt the PLA was small enough for their needs. If they had already done the floorplanning, making the PLA even smaller wasn't going to make the chip any smaller, and their engineering time would be better spent elsewhere.

justinator · 2024-12-30T01:52:08 1735523528

It's hard to believe that people collaborating on something this important to the company aren't like, in a meeting at least weekly talking about implementation details like this.

The other thing that's hard for me to believe is there wasn't an extensive and mostly automated QA process that would test absolutely every little feature of this CPU.

ajross · 2024-12-29T01:50:43 1735437043

"Make it work first before you make it work fast". Fundamentally this is a software problem solved with software techniques. And like most software there's some optimization left on the table just because no one thought of it in time. And you can't patch a CPU of this era.

kens · 2024-12-29T01:47:20 1735436840

Returning 0 for undefined table entries is the obvious thing to do. Setting these entries to 2 is a bit of a conceptual leap, even though it would have prevented the FDIV error and it makes the PLA simpler. So I can't fault Intel for this.

Sniffnoy · 2024-12-29T03:47:19 1735444039

It's not really a conceptual leap if you've ever had to work with "don't care" cases before...

mjevans · 2024-12-29T08:08:33 1735459713

It's a NULL / 'do not care' issue. 0 isn't a reserved out of band value, it's payload data and anything beyond the bounds should have been DNC.

It's possible some other result, likely aligned to an easy binary multiple would still produce a square block of 2, and that allowing the far edges to float to some other value could yield a slightly more compact logic array. Back-filling the entire side to the clamped upper value doesn't cost that much more though, and is known to solve the issue. As pointed out elsewhere, that sort of solution would also be faster for engineering time, fit within the planned space budget, and best of all reduces conative load. It's obviously correct when looking at the bug.

garaetjjte · 2024-12-30T12:23:22 1735561402

I would have expected that instead of manually picking a value they would be specified as "don't care". I guess optimizer software like Espresso should allow for that?

lizzas · 2024-12-29T02:37:08 1735439828

That must have been such a satisfying fix for the engineers though!

jandrese · 2024-12-29T04:06:10 1735445170

More engineering time resulted in a more efficient solution.