The Three Dao Team Again Applies Black Magic of Optimization, This Time to Accelerate MoE Training
TG AI News·December 19, 2025 at 2:52 PM·
Trusted Source
The Three Dao team again applies black magic of optimization, this time to accelerate MoE training. SonicMoE is almost twice as fast as the best open kernels for MoE, while using almost half the memory for storing activations. In practice, this increases training efficiency by one and a half times — 64 H100 with SonicMoE trains a 7B MoE model at the same speed as 96 H100 with the previous best implementation.