No it's definitely changed a lot. The speedups have been massive (GPT 4 runs faster now than 3.5-turbo did at launch) and they can't be explained with just them rolling out H100s since that's just a 2x inference boost. Some unknown in-house optimization method aside, they've probably quantized the models down to a few bits of precision which increases perplexity quite a bit. They've also continued to RHLF tune to make them more in-line with their guidelines and that process has been shown to decrease overall performance before GPT 4 even launched.