Large Language Models (LLMs) employing Chain-of-Thought (CoT) prompting have broadened the scope for improving multi-step reasoning capabilities. Usually, answer calibration strategies such as step-level or path-level calibration play a vital role in multi-step reasoning. While effective, there remains a significant gap in our understanding of the key factors that drive their success. In this paper, we break down the design of recent answer calibration strategies and present a unified view which establishes connections between them. We then conduct a thorough evaluation on these strategies from a unified view, systematically scrutinizing step-level and path-level answer calibration across multiple paths. Our study holds the potential to illuminate key insights for optimizing multi-step reasoning with answer calibration.
翻译:采用链式思维提示的大型语言模型拓展了提升多步推理能力的范畴。通常,步骤级或路径级的答案校准策略在多步推理中起着关键作用。尽管这些策略行之有效,但我们对驱动其成功的关键因素仍缺乏深入理解。本文解析了近期答案校准策略的设计,并提出了统一视角,揭示了策略间的关联。基于此统一视角,我们对这些策略进行了全面评估,系统性地审视了多路径下的步骤级与路径级答案校准。本研究有望为通过答案校准优化多步推理提供关键启示。