We present T$^\star$, a simple \textsc{TraceRL}-based training curriculum for progressive block-size scaling in masked diffusion language models (MDMs). Starting from an AR-initialized small-block MDM, T$^\star$~transitions smoothly to larger blocks, enabling higher-parallelism decoding with minimal performance degradation on math reasoning benchmarks. Moreover, further analysis suggests that T$^\star$~can converge to an alternative decoding schedule $\hat{\rm S}$ that achieves comparable performance.
翻译:本文提出T$^\star$,一种基于轨迹强化学习的简单训练课程,用于掩码扩散语言模型的渐进式块大小缩放。该方法从自回归初始化的小块MDM出发,通过T$^\star$平滑过渡到更大块尺寸,从而在数学推理基准测试中以最小性能损失实现更高并行度的解码。进一步分析表明,T$^\star$能够收敛至一种替代解码调度方案$\hat{\rm S}$,该方案可获得相当的性能表现。