Large Language Models have achieved strong performance on reasoning tasks with objective answers by generating step-by-step solutions, but diagnosing where a multi-step reasoning trace might fail remains difficult. Confidence estimation offers a diagnostic signal, yet existing methods are restricted to final answers or require internal model access. In this paper, we introduce Stepwise Confidence Attribution (SCA), a framework for closed-source LLMs that assigns step-level confidence based only on generated reasoning traces. SCA applies the Information Bottleneck principle: steps aligning with consensus structures across correct solutions receive high confidence, while deviations are flagged as potentially erroneous. We propose two complementary methods: (1) NIBS, a non-parametric IB approach measuring consistency without graph structures, and (2) GIBS, a graph-based IB model that learns subgraphs through a differentiable mask to capture logical variability. Extensive experiments on mathematical reasoning and multi-hop question answering show that SCA reliably identifies low-confidence steps strongly correlated with reasoning errors. Moreover, using step-level confidence to guide self-correction improves the correction success rate by up to 13.5\% over answer-level feedback.
翻译:大语言模型通过生成逐步解答在客观答案的推理任务上表现出色,但诊断多步推理轨迹中的失败点仍然困难。置信度估计提供了一种诊断信号,然而现有方法局限于最终答案或需要内部模型访问权限。本文提出逐步置信归因(SCA)框架,该框架仅基于生成的推理轨迹为闭源大语言模型分配步骤级置信度。SCA应用信息瓶颈原理:与正确解决方案中共识结构对齐的步骤获得高置信度,而偏离部分则被标记为潜在错误。我们提出两种互补方法:(1)NIBS,一种无需图结构的非参数化IB方法,通过一致性测量实现;(2)GIBS,一种基于图的IB模型,通过可微分掩码学习子图以捕捉逻辑可变性。在数学推理和多跳问答上的大量实验表明,SCA能可靠识别与推理错误高度相关的低置信步骤。此外,利用步骤级置信度指导自我纠错,相比答案级反馈可将纠错成功率提升高达13.5%。