Logical reasoning has been an ongoing pursuit in the field of AI. Despite significant advancements made by large language models (LLMs), they still struggle with complex logical reasoning problems. To enhance reasoning performance, one promising direction is scalable oversight, which requires LLMs to identify their own errors and then improve by themselves. Various self-verification methods have been proposed in pursuit of this goal. Nevertheless, whether existing models understand their own errors well is still under investigation. In this paper, we take a closer look at the self-verification abilities of LLMs in the context of logical reasoning, focusing on their ability to identify logical fallacies accurately. We introduce a dataset, FALLACIES, containing 232 types of reasoning fallacies categorized in a hierarchical taxonomy. By conducting exhaustive experiments on FALLACIES, we obtain comprehensive and detailed analyses of a series of models on their verification abilities. Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods. Drawing from these observations, we offer suggestions for future research and practical applications of self-verification methods.
翻译:逻辑推理一直是人工智能领域持续追求的目标。尽管大型语言模型取得了显著进展,但在处理复杂逻辑推理问题时仍存在困难。为提升推理性能,可扩展监督是一个有前景的方向,这要求模型能够识别自身错误并自主改进。为此,研究者提出了多种自我验证方法。然而,现有模型对自身错误的理解程度仍有待探究。本文聚焦逻辑推理场景,深入考察了大型语言模型的自我验证能力,重点评估其准确识别逻辑谬误的能力。我们构建了包含232种推理谬误类型的FALLACIES数据集,并依据层次化分类体系进行归类。通过在FALLACIES数据集上开展全面实验,我们获得了一系列模型验证能力的详尽分析。主要发现表明,现有模型在准确识别谬误推理步骤方面存在不足,难以保障自我验证方法的有效性。基于这些发现,我们为自我验证方法的未来研究和实际应用提出了建议。