A lot of people are reporting low context sizes tokens per second without discussing how slow it is at bigger sizes. In some cases they are also not mentioning time to first inference. If you dig around /LocalLLaMA you can find some posts with better bench-marking.