I'm not aware of any other openly-licensed model of comparable size to 54b. That seems like a worthwhile addition to what is already available, imo.
The closest is mixtral 8x7b but that one only uses a fraction of its parameters for each pass. This one should produce better but slower results at roughly the same memory requirement.
Mixtral 8x7B has 13B activations (2 experts/pass) on 47B weights, so not so different from the Qwen 2 MoE (14B activations on 57B weights). I'd agree that the new model is probably the new strongest option in this "middle-sized" weight class, although Yi 1.5 34B isn't bad (a dense model, so 2.4X slower inference, but also almost half the weights).
One nice thing is that all three of these models are Apache 2.0 licensed.
The closest is mixtral 8x7b but that one only uses a fraction of its parameters for each pass. This one should produce better but slower results at roughly the same memory requirement.