Standard masked discrete diffusion models face limitations in reasoning tasks due to their inability to correct their own mistakes on the masking path. Since they rely on a fixed number of denoising steps, they are unable to adjust their computation to the complexity of a given problem. To address these limitations, we introduce a method based on learning a Markov transition kernel that is trained on its own outputs. This design enables tokens to be remasked, allowing the model to correct its previous mistakes. Furthermore, we do not need a fixed time schedule but use a trained stopping criterion. This allows for adaptation of the number of function evaluations to the difficulty of the reasoning problem. Our adaptation adds two lightweight prediction heads, enabling reuse and fine-tuning of existing pretrained models. On the Sudoku-Extreme dataset we clearly outperform other flow based methods with a validity of 95%. For the Countdown-4 we only need in average of 10 steps to solve almost 96% of them correctly, while many problems can be solved already in 2 steps.
翻译:标准掩码离散扩散模型在推理任务中面临局限性,原因在于其无法纠正掩码路径上的自身错误。由于依赖固定数量的去噪步骤,此类模型无法根据具体问题的复杂度调整计算量。为克服这些局限,我们提出一种基于学习马尔可夫转移核的方法,该核函数通过模型自身输出进行训练。这一设计使词元可被重新掩码,从而使模型能够修正先前的错误。此外,我们无需固定时间调度,而是采用训练得到的停止准则,从而允许根据推理问题的难度自适应调整函数评估次数。我们的改进仅需添加两个轻量级预测头,即可实现现有预训练模型的重用与微调。在数独极限数据集上,我们的方法以95%的有效性显著优于其他基于流的方法;在倒计时4问题中,平均仅需10步即可正确求解近96%的实例,而许多问题甚至可在2步内解决。