Masked diffusion models have shown promising performance in generating high-quality samples in a wide range of domains, but accelerating their sampling process remains relatively underexplored. To investigate efficient samplers for masked diffusion, this paper theoretically analyzes the MaskGIT sampler for image modeling, revealing its implicit temperature sampling mechanism. Through this analysis, we introduce the "moment sampler," an asymptotically equivalent but more tractable and interpretable alternative to MaskGIT, which employs a "choose-then-sample" approach by selecting unmasking positions before sampling tokens. In addition, we improve the efficiency of choose-then-sample algorithms through two key innovations: a partial caching technique for transformers that approximates longer sampling trajectories without proportional computational cost, and a hybrid approach formalizing the exploration-exploitation trade-off in adaptive unmasking. Experiments in image and text domains demonstrate our theory as well as the efficiency of our proposed methods, advancing both theoretical understanding and practical implementation of masked diffusion samplers.
翻译:掩码扩散模型在多个领域展现出生成高质量样本的潜力,但其采样过程的加速研究仍相对不足。为探索掩码扩散的高效采样方法,本文从理论上分析了用于图像建模的MaskGIT采样器,揭示了其隐含的温度采样机制。基于此分析,我们提出了'矩采样器'——一种与MaskGIT渐近等价但更易处理与解释的替代方案,其采用'先选后采'策略,即在采样标记前先确定解掩码位置。此外,我们通过两项关键创新提升了'先选后采'算法的效率:一是针对Transformer的部分缓存技术,可在不增加线性计算成本的情况下近似更长的采样轨迹;二是形式化自适应解掩码中探索-利用权衡的混合策略。在图像与文本领域的实验验证了我们的理论,并证明了所提方法的高效性,从而推动了掩码扩散采样器的理论理解与工程实践。