It requires more than 1100 bits to do correct argument reduction for double, actually.
You're absolutely correct that a software implementation may be several times faster than the legacy x87 fsin instruction, while delivering well-rounded results. There shouldn't be a need to write your own implementation, however. High-quality library implementations are pretty widely available these days.
Agreed. I'm stunned that there is a compiler currently in existence that actually uses the built-in Intel transcendentals rather than their own library.
You're absolutely correct that a software implementation may be several times faster than the legacy x87 fsin instruction, while delivering well-rounded results. There shouldn't be a need to write your own implementation, however. High-quality library implementations are pretty widely available these days.