We propose a novel speculative decoding method tailored for multi-sample reasoning scenarios, such as self-consistency and Best-of-N sampling. Our method exploits the intrinsic consensus of parallel generation paths to synthesize high-quality draft tokens without requiring auxiliary models or external databases. By dynamically analyzing structural patterns across parallel reasoning paths through a probabilistic aggregation mechanism, it identifies consensus token sequences that align with the decoding distribution. Evaluations on mathematical reasoning benchmarks demonstrate a substantial improvement in draft acceptance rates over baselines, while reducing the latency in draft token construction. This work establishes a paradigm shift for efficient multi-sample inference, enabling seamless integration of speculative decoding with sampling-based reasoning techniques.
翻译:本文提出了一种新颖的推测解码方法,专为多样本推理场景(如自洽性与N选一采样)设计。该方法利用并行生成路径的内在共识来合成高质量的草稿词元,无需辅助模型或外部数据库。通过概率聚合机制动态分析并行推理路径间的结构模式,该方法能识别与解码分布一致的共识词元序列。在数学推理基准测试上的评估表明,相较于基线方法,本方法在草稿接受率上取得了显著提升,同时降低了草稿词元构建的延迟。这项工作为高效的多样本推理建立了范式转变,使得推测解码能够与基于采样的推理技术无缝集成。