Mathematical verfier achieves success in mathematical reasoning tasks by validating the correctness of solutions. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately assess the solutions. To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedbacks as rationale labels (i.e., the correctness of the current step and the explanations). In this paper, we propose \textbf{Math-Minos}, a natural language feedback enhanced verifier by constructing automatically-generated training data and a two-stage training paradigm for effective training and efficient inference. Our experiments reveal that a small set (30k) of natural language feedbacks can significantly boost the performance of the verifier by the accuracy of 1.6\% (86.6\% $\rightarrow$ 88.2\%) on GSM8K and 0.8\% (37.8\% $\rightarrow$ 38.6\%) on MATH. We have released our code and data for further exploration.
翻译:数学验证器通过验证解答的正确性,在数学推理任务中取得了成功。然而,现有的验证器使用二元分类标签进行训练,这些标签所包含的信息不足以让模型准确评估解答。为缓解上述二元标签的信息不足问题,我们引入了分步自然语言反馈作为原理标签(即当前步骤的正确性及解释)。本文提出 **Math-Minos**,一种通过构建自动生成的训练数据和采用两阶段训练范式(以实现有效训练和高效推理)来增强的自然语言反馈验证器。我们的实验表明,一小部分(30k)自然语言反馈能显著提升验证器的性能,在GSM8K数据集上准确率提升1.6%(从86.6%提升至88.2%),在MATH数据集上提升0.8%(从37.8%提升至38.6%)。我们已发布代码和数据以供进一步探索。