Chain-of-Thought (CoT) reasoning improves multi-step mathematical problem solving in large language models but remains vulnerable to exposure bias and error accumulation, as early mistakes propagate irreversibly through autoregressive decoding. In this work, we propose DiffCoT, a diffusion-styled CoT framework that reformulates CoT reasoning as an iterative denoising process. DiffCoT integrates diffusion principles at the reasoning-step level via a sliding-window mechanism, enabling unified generation and retrospective correction of intermediate steps while preserving token-level autoregression. To maintain causal consistency, we further introduce a causal diffusion noise schedule that respects the temporal structure of reasoning chains. Extensive experiments on three multi-step CoT reasoning benchmarks across diverse model backbones demonstrate that DiffCoT consistently outperforms existing CoT preference optimization methods, yielding improved robustness and error-correction capability in CoT reasoning.
翻译:Chain-of-Thought(CoT)推理能提升大语言模型在多步数学问题求解中的表现,但始终面临暴露偏差与错误累积问题——由于自回归解码的特性,早期错误会不可逆地持续传播。本文提出DiffCoT,一种基于扩散风格的思维链框架,将CoT推理重构为迭代去噪过程。DiffCoT通过滑动窗口机制在推理步骤层级融合扩散原理,既能实现中间步骤的统一生成与回溯修正,又可保持词元级别的自回归特性。为维护因果一致性,我们进一步引入符合推理链时序结构的因果扩散噪声调度机制。在多个多步CoT推理基准上,基于不同模型骨干的广泛实验表明,DiffCoT始终优于现有CoT偏好优化方法,显著提升了CoT推理的鲁棒性与错误修正能力。