Standard masked discrete diffusion models face limitations in reasoning tasks due to their inability to correct their own mistakes on the masking path. Since they rely on a fixed number of denoising steps, they are unable to adjust their computation to the complexity of a given problem. To address these limitations, we introduce a method based on learning a Markov transition kernel that is trained on its own outputs. This design enables tokens to be remasked, allowing the model to correct its previous mistakes. Furthermore, we do not need a fixed time schedule but use a trained stopping criterion. This allows for adaptation of the number of function evaluations to the difficulty of the reasoning problem. Our adaptation adds two lightweight prediction heads, enabling reuse and fine-tuning of existing pretrained models. On the Sudoku-Extreme dataset we clearly outperform other flow based methods with a validity of 95%. For the Countdown-4 we only need in average of 10 steps to solve almost 96% of them correctly, while many problems can be solved already in 2 steps.
翻译:标准的掩码离散扩散模型在推理任务中存在局限性,原因在于其无法在掩码路径上修正自身错误。由于依赖固定数量的去噪步骤,这些模型无法根据给定问题的复杂度调整计算量。为应对这些限制,我们提出一种基于学习马尔可夫转移核的方法,该转移核在其自身输出上进行训练。这种设计使得标记能够被重新掩码,从而允许模型修正先前错误。此外,我们无需固定时间调度,而是采用经训练的停止准则。这使得函数评估次数能够根据推理问题的难度进行自适应调整。我们的改进方案通过添加两个轻量级预测头,实现了对现有预训练模型的复用与微调。在Sudoku-Extreme数据集上,我们以95%的有效率显著优于其他基于流的方法。对于Countdown-4问题,我们平均仅需10步即可正确求解近96%的问题,而许多问题仅需2步即可解决。