The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in the generated text. In particular: (1) A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. (2) Generated texts tend to be lengthy and lack a clearly defined granularity for individual facts. (3) There is a scarcity of explicit evidence available during the process of fact checking. With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). Experiments on four different tasks (knowledge-based QA, code generation, mathematical reasoning, and scientific literature review) show the efficacy of the proposed method. We release the code of FacTool associated with ChatGPT plugin interface at https://github.com/GAIR-NLP/factool .
翻译:生成式预训练模型的出现促进了高质量文本的合成,但也给识别生成文本中的事实性错误带来了挑战。具体而言:(1)当生成模型处理文本时,更广泛的任务现在面临包含事实性错误的更高风险。(2)生成的文本往往篇幅较长,且缺乏对单个事实的明确定义粒度。(3)在事实核查过程中,可用的显式证据较为稀缺。针对上述挑战,本文提出FacTool,一个任务与领域无关的框架,用于检测大型语言模型(如ChatGPT)生成文本中的事实性错误。在四个不同任务(基于知识的问答、代码生成、数学推理和科学文献综述)上的实验证明了所提方法的有效性。我们已在https://github.com/GAIR-NLP/factool 上发布了与ChatGPT插件接口关联的FacTool代码。