This is pretty exciting. Now an organization could produce an open weights mixture of experts model that has 8-15b active parameters but could still be 500b+ parameters and it could be run locally with INT4 quantization with very fast performance. DeepSeek R1 is a similar model but over 30b active parameters which makes it a little slow.
I do not have a good sense of how well quality scales with narrow MoEs but even if we get something like Llama 3.3 70b in quality at only 8b active parameters people could do a ton locally.
I do not have a good sense of how well quality scales with narrow MoEs but even if we get something like Llama 3.3 70b in quality at only 8b active parameters people could do a ton locally.