Mathematical verfier achieves success in mathematical reasoning tasks by validating the correctness of solutions. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately assess the solutions. To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedbacks as rationale labels (i.e., the correctness of the current step and the explanations). In this paper, we propose \textbf{Math-Minos}, a natural language feedback enhanced verifier by constructing automatically-generated training data and a two-stage training paradigm for effective training and efficient inference. Our experiments reveal that a small set (30k) of natural language feedbacks can significantly boost the performance of the verifier by the accuracy of 1.6\% (86.6\% $\rightarrow$ 88.2\%) on GSM8K and 0.8\% (37.8\% $\rightarrow$ 38.6\%) on MATH. We have released our code and data for further exploration.
翻译:数学验证器通过验证解答的正确性,在数学推理任务中取得了成功。然而,现有验证器使用二元分类标签进行训练,此类标签所提供的信息不足以让模型准确评估解答。为缓解二元标签的上述不足,我们引入了分步自然语言反馈作为推理标签(即当前步骤的正确性及解释说明)。本文提出\textbf{Math-Minos}——一种通过构建自动生成的训练数据及采用两阶段训练范式(以实现高效训练与推理)来增强的自然语言反馈验证器。实验表明,小规模(30k)自然语言反馈数据集可显著提升验证器性能,在GSM8K数据集上准确率提升1.6%(86.6% $\rightarrow$ 88.2%),在MATH数据集上提升0.8%(37.8% $\rightarrow$ 38.6%)。我们已开源代码与数据以供进一步研究。