It seems like the speedups here are most useful for small models, since on large...

ptrj_ · 2025-05-28T02:38:16 1748399896

This could also give a nice speedup for MoE models w/ total 7B-70B parameters but O(10x) fewer active params, e.g. https://huggingface.co/Qwen/Qwen3-30B-A3B, assuming the expert router can be effectively scheduled within the monolithic mega-kernel.

mmoskal · 2025-05-28T03:55:31 1748404531

They are reducing forward pass time from say 1.5ms to 1ms. On bigger model you would likely reduce from 15ms to 14.2ms or something like that.