I was looking at this the other day. I'm pretty sure OpenAI run the internal reasoning into a model that purges the reasoning and makes it worse to train other models from.
I might be mistaken, but originally the reasoning was fully hidden? Or maybe it was just far more aggressively purged. I agree that today the reasoning output seems higher quality then originally.