Detecting factual errors in textual information, whether generated by large language models (LLM) or curated by humans, is crucial for making informed decisions. LLMs' inability to attribute their claims to external knowledge and their tendency to hallucinate makes it difficult to rely on their responses. Humans, too, are prone to factual errors in their writing. Since manual detection and correction of factual errors is labor-intensive, developing an automatic approach can greatly reduce human effort. We present FLEEK, a prototype tool that automatically extracts factual claims from text, gathers evidence from external knowledge sources, evaluates the factuality of each claim, and suggests revisions for identified errors using the collected evidence. Initial empirical evaluation on fact error detection (77-85\% F1) shows the potential of FLEEK. A video demo of FLEEK can be found at https://youtu.be/NapJFUlkPdQ.
翻译:在文本信息中检测实情错误——无论是大语言模型(LLM)生成还是人工整理的内容——对于做出明智决策至关重要。大语言模型无法将其陈述与外部知识关联,且存在生成幻觉的倾向,这使得其输出结果难以依赖。人类在写作中也容易出现事实性错误。由于人工检测和修正事实错误劳动密集度高,开发自动化方法能够大幅降低人力投入。我们提出原型工具FLEEK,可自动从文本中提取事实性声明,从外部知识源收集证据,评估每条声明的事实准确性,并基于收集的证据对已识别错误提出修订建议。初步实证评估显示,FLEEK在事实错误检测任务上达到77-85%的F1分数,展现出其应用潜力。FLEEK的演示视频可访问https://youtu.be/NapJFUlkPdQ。