Recent work on test-time scaling for large language model (LLM) reasoning typically assumes that allocating more inference-time computation uniformly improves correctness. However, prior studies show that reasoning uncertainty is highly localized: a small subset of low-confidence tokens disproportionately contributes to reasoning errors and unnecessary output expansion. Motivated by this observation, we propose Thinking by Subtraction, a confidence-driven contrastive decoding approach that improves reasoning reliability through targeted token-level intervention. Our method, Confidence-Driven Contrastive Decoding, detects low-confidence tokens during decoding and intervenes selectively at these positions. It constructs a contrastive reference by replacing high-confidence tokens with minimal placeholders, and refines predictions by subtracting this reference distribution at low-confidence locations. Experiments show that CCD significantly improves accuracy across mathematical reasoning benchmarks while substantially reducing output length, with minimal KV-cache overhead. As a training-free method, CCD enhances reasoning reliability through targeted low-confidence intervention without computational redundancy. Our code will be made available at: https://github.com/bolo-web/CCD.
翻译:近期关于大语言模型推理的测试时扩展研究通常假设分配更多推理计算资源会一致性地提升正确率。然而,先前研究表明推理不确定性具有高度局部性:一小部分低置信度词元对推理错误和不必要的输出扩张产生了不成比例的影响。受此观察启发,我们提出“基于减法的思考”——一种置信度驱动的对比解码方法,通过针对性的词元级干预提升推理可靠性。我们的方法“置信度驱动对比解码”在解码过程中检测低置信度词元,并选择性地在这些位置进行干预。该方法通过将高置信度词元替换为最小占位符构建对比参照,并在低置信度位置通过减去该参照分布来优化预测。实验表明,CCD在显著减少输出长度的同时,显著提升了数学推理基准测试的准确率,且仅带来极小的KV缓存开销。作为一种免训练方法,CCD通过针对性的低置信度干预提升了推理可靠性,避免了计算冗余。我们的代码将在以下地址公开:https://github.com/bolo-web/CCD。