I'm not aware of any other openly-licensed model of comparable size to 54b. That...

lhl · 2024-06-07T10:26:31 1717755991

Mixtral 8x7B has 13B activations (2 experts/pass) on 47B weights, so not so different from the Qwen 2 MoE (14B activations on 57B weights). I'd agree that the new model is probably the new strongest option in this "middle-sized" weight class, although Yi 1.5 34B isn't bad (a dense model, so 2.4X slower inference, but also almost half the weights).

One nice thing is that all three of these models are Apache 2.0 licensed.