Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models. In this work, we propose Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought, a well-established technique for improving the reasoning ability of autoregressive language models. In contrast to autoregressive language models that make decisions in a left-to-right, token-by-token manner, DoT allows reasoning steps to diffuse over time through a diffusion language model and offers greater flexibility in trading-off computation for reasoning performance. Our experimental results demonstrate the effectiveness of DoT in multi-digit multiplication, boolean logic, and grade school math problems, with a small diffusion model outperforming a much larger autoregressive model in both efficiency and accuracy. In addition to that, DoT showcases promising self-correction abilities and benefits from existing reasoning-enhancing techniques like self-consistency decoding. Our findings contribute to the understanding and development of reasoning with diffusion language models.
翻译:近年来,扩散模型因其相较于传统自回归模型的诸多潜在优势,在文本处理领域引起了广泛关注。本文提出思维扩散(DoT)这一新方法,将扩散模型与思维链(一种用于提升自回归语言模型推理能力的成熟技术)相结合。与自回归语言模型以从左到右、逐词标记的方式做出决策不同,DoT允许推理步骤通过扩散语言模型随时间进行扩散,并在计算量与推理性能的权衡方面提供了更大的灵活性。实验结果表明,DoT在多位数乘法、布尔逻辑和小学数学问题上具有显著效果,小型扩散模型在效率和准确性上均优于规模大得多的自回归模型。此外,DoT展现出良好的自我纠错能力,并能受益于自我一致性解码等现有推理增强技术。我们的研究结果有助于理解和推进扩散语言模型的推理能力发展。