IDK about GPT4 specifically, but I have recently witnessed a case where small fi...

IDK about GPT4 specifically, but I have recently witnessed a case where small finetuned 7Bs greatly outperformed larger models (Mixtral Instruct, Llama 70B finetunes) in a few very specific tasks.

There is nothing unreasonable about this. However I do dislike it when that information is presented in a fishy way, implying that it "outperforms GPT4" without any qualification.