The issue is benchmarks for LLMs or models formats are tough to compare, as there are many factors at play. But beyond ooba's comparison, many other sources recommend GPTQ or AWQ for GPU inference as it gives better quality for the same quant level (AWQ apparently takes more VRAM though, but better quality). Given how many models are available I would take these tests with a grain of salts.