In this work, we propose a novel method named \textbf{Auto}mated Process Labeling via \textbf{C}onfidence \textbf{V}ariation (\textbf{\textsc{AutoCV}}) to enhance the reasoning capabilities of large language models (LLMs) by automatically annotating the reasoning steps. Our approach begins by training a verification model on the correctness of final answers, enabling it to generate automatic process annotations. This verification model assigns a confidence score to each reasoning step, indicating the probability of arriving at the correct final answer from that point onward. We detect relative changes in the verification's confidence scores across reasoning steps to automatically annotate the reasoning process. This alleviates the need for numerous manual annotations or the high computational costs associated with model-induced annotation approaches. We experimentally validate that the confidence variations learned by the verification model trained on the final answer correctness can effectively identify errors in the reasoning steps. Subsequently, we demonstrate that the process annotations generated by \textsc{AutoCV} can improve the accuracy of the verification model in selecting the correct answer from multiple outputs generated by LLMs. Notably, we achieve substantial improvements across five datasets in mathematics and commonsense reasoning. The source code of \textsc{AutoCV} is available at \url{https://github.com/rookie-joe/AUTOCV}.
翻译:在本工作中,我们提出了一种名为**自动**化过程标注通过**置信度****变化**(**AutoCV**)的新方法,旨在通过自动标注推理步骤来增强大语言模型(LLMs)的推理能力。我们的方法首先训练一个基于最终答案正确性的验证模型,使其能够生成自动的过程标注。该验证模型为每个推理步骤分配一个置信度分数,表示从该点出发获得最终正确答案的概率。我们通过检测验证模型在推理步骤间置信度分数的相对变化来自动标注推理过程。这减少了对大量人工标注的需求,也避免了模型诱导标注方法带来的高计算成本。我们通过实验验证,基于最终答案正确性训练的验证模型所学习到的置信度变化能够有效识别推理步骤中的错误。随后,我们证明了由AutoCV生成的过程标注能够提高验证模型从LLMs生成的多个输出中选择正确答案的准确率。值得注意的是,我们在五个数学和常识推理数据集上均取得了显著提升。AutoCV的源代码可在 \url{https://github.com/rookie-joe/AUTOCV} 获取。