They said in the announcement that they've implemented speculative decoding, so ...

bubblethink · 2024-10-25T06:43:06 1729838586

Speculative decoding does not trade off accuracy. You reject the speculated tokens if the original model does not accept them, kind of like branch prediction. All these providers and third parties benchmark each other's solutions, so if there is a drop in accuracy, someone will report it. Their sequence length is 8k.