x87 is 80-bit floating point. They literally don't fit inside of 64-bit doubles of SSE.
The extra bits need to be emulated.
EDIT: And I'm sure there's some program out there that actually relies on those extra 16-bits of precision, and they'd be pissed if their least-significant bit had a fraction-of-a-bit more error per operation.
They are not emulated, they run at optimal latency (in fact on Ice Lake FADD has better latency than ADDSD!), although at a lower throughput as there are less dedicated execution units.
That's a strong point. I guess they really aren't emulated then.
That really makes me wonder how the 80-bits are stored then. I guess the "stack" is just part of the register-renaming mechanism? Huh... AVX registers are 256-bits, so I guess 80-bits fits in each one.
Yes, x86 stack per se doesn't exist anymore and it is mapped to the general register file. I have no idea how the 80 bits are handled. I thought that the AVX registers mapped to multiple entries in the file, but maybe I'm wrong.