Recently, Large Language Models (LLMs) have drawn significant attention due to their outstanding reasoning capabilities and extensive knowledge repository, positioning them as superior in handling various natural language processing tasks compared to other language models. In this paper, we present a preliminary investigation into the potential of LLMs in fact-checking. This study aims to comprehensively evaluate various LLMs in tackling specific fact-checking subtasks, systematically evaluating their capabilities, and conducting a comparative analysis of their performance against pre-trained and state-of-the-art low-parameter models. Experiments demonstrate that LLMs achieve competitive performance compared to other small models in most scenarios. However, they encounter challenges in effectively handling Chinese fact verification and the entirety of the fact-checking pipeline due to language inconsistencies and hallucinations. These findings underscore the need for further exploration and research to enhance the proficiency of LLMs as reliable fact-checkers, unveiling the potential capability of LLMs and the possible challenges in fact-checking tasks.
翻译:近期,大语言模型(LLMs)因其卓越的推理能力和庞大的知识库而备受关注,在处理各类自然语言处理任务方面展现出相较于其他语言模型的优越性。本文对LLMs在事实核查领域的潜力进行了初步探讨。本研究旨在全面评估多种LLMs在应对特定事实核查子任务中的表现,系统评价其能力,并将其性能与预训练模型及最先进的低参数模型进行对比分析。实验表明,在大多数场景下,LLMs能取得与其他小模型相竞争的性能。然而,由于语言不一致性和幻觉现象,LLMs在处理中文事实核查任务以及事实核查全流程时仍面临挑战。这些发现表明,需要进一步探索和研究,以提升LLMs作为可靠事实核查员的能力,并揭示LLMs在事实核查任务中的潜在能力与可能面临的挑战。