Recent advances in block diffusion language models have demonstrated competitive performance and strong scalability on reasoning tasks. However, existing BDLMs have limited exploration under the test-time scaling setting and face more severe decoding challenges in long Chain-of-Thought reasoning, particularly in balancing the decoding speed and effectiveness. In this work, we propose a unified framework for test-time scaling in BDLMs that introduces adaptivity in both decoding and block-wise generation. At the decoding level, we propose Bounded Adaptive Confidence Decoding (BACD), a difficulty-aware sampling strategy that dynamically adjusts denoising based on model confidence, accelerating inference while controlling error accumulation. Beyond step-wise adaptivity, we introduce Think Coarse, Critic Fine (TCCF), a test-time scaling paradigm that allocates large block sizes to exploratory reasoning and smaller block sizes to refinement, achieving an effective efficiency-effectiveness balance. To enable efficient and effective decoding with a large block size, we adopt Progressive Block Size Extension, which mitigates performance degradation when scaling block sizes. Extensive experiments show that applying BACD and TCCF to TDAR-8B yields significant improvements over strong baselines such as TraDo-8B (2.26x speedup, +11.2 points on AIME24). These results mark an important step toward unlocking the potential of BDLMs for test-time scaling in complex reasoning tasks.
翻译:近期块扩散语言模型(BDLM)在推理任务上展现出具有竞争力的性能和强大的可扩展性。然而,现有BDLM在测试时扩展设置下的探索有限,并且在长思维链推理中面临更严重的解码挑战,尤其是在平衡解码速度与效果方面。本工作提出了一个用于BDLM测试时扩展的统一框架,该框架在解码和块级生成中均引入了自适应性。在解码层面,我们提出了有界自适应置信度解码(BACD),这是一种难度感知的采样策略,它根据模型置信度动态调整去噪过程,从而在控制误差累积的同时加速推理。除了步进自适应性,我们还引入了"粗思细评"(TCCF)测试时扩展范式,该范式将大块尺寸分配给探索性推理,将小块尺寸分配给精炼过程,实现了效率与效果的有效平衡。为了在使用大块尺寸时实现高效且有效的解码,我们采用了渐进式块尺寸扩展方法,以减轻扩展块尺寸时的性能下降。大量实验表明,将BACD和TCCF应用于TDAR-8B模型,相比TraDo-8B等强基线模型取得了显著提升(推理速度提升2.26倍,在AIME24数据集上得分提升11.2分)。这些结果标志着在释放BDLM于复杂推理任务中进行测试时扩展的潜力方面迈出了重要一步。