I remember clearly this problem happening in the past, despite their claims. I initially thought it was an elaborate hoax, but it turned out to be factually true in my case.
I tend to think it would be very hard and very risky for large, successful companies to systematically lie about these things without getting caught, and the people who would be doing the lying in this case are not professional liars, they’re engineers who generally seem trustworthy. So yes, if there is a degradation, I think bugs are much more likely than systematic lying.
The TPU implementation used approximate top-k instead of the exact used on nvidia. While that wouldn't matter too much and there was a bug with it, it still was a cost savings thing not to use exact from the beginning because it wasn't efficient on TPUs which they were routing to under load. So it was a bit of a model difference under load, even aside from the bug.
I remember clearly this problem happening in the past, despite their claims. I initially thought it was an elaborate hoax, but it turned out to be factually true in my case.