Misinformation spreading over the Internet poses a significant threat to both societies and individuals, necessitating robust and scalable fact-checking that relies on retrieving accurate and trustworthy evidence. Previous methods rely on semantic and social-contextual patterns learned from training data, which limits their generalization to new data distributions. Recently, Retrieval Augmented Generation (RAG) based methods have been proposed to utilize the reasoning capability of LLMs with retrieved grounding evidence documents. However, these methods largely rely on textual similarity for evidence retrieval and struggle to retrieve evidence that captures multi-hop semantic relations within rich document contents. These limitations lead to overlooking subtle factual correlations between the evidence and the claims to be fact-checked during evidence retrieval, thus causing inaccurate veracity predictions. To address these issues, we propose WKGFC, which exploits authorized open knowledge graph as a core resource of evidence. LLM-enabled retrieval is designed to assess the claims and retrieve the most relevant knowledge subgraphs, forming structured evidence for fact verification. To augment the knowledge graph evidence, we retrieve web contents for completion. The above process is implemented as an automatic Markov Decision Process (MDP): A reasoning LLM agent decides what actions to take according to the current evidence and the claims. To adapt the MDP for fact-checking, we use prompt optimization to fine-tune the agentic LLM.
翻译:互联网上传播的虚假信息对社会和个人均构成重大威胁,亟需依赖准确可信证据检索的鲁棒且可扩展的事实核查方法。现有方法依赖于从训练数据中习得的语义与社会语境模式,这限制了其对新数据分布的泛化能力。近期提出的基于检索增强生成(RAG)的方法尝试利用大语言模型(LLM)的推理能力,结合检索到的实体证据文档进行事实核查。然而,这些方法主要依赖文本相似性进行证据检索,难以从丰富的文档内容中捕获具有多跳语义关联的证据。这些局限性导致证据检索过程中容易忽略证据与待核查声明之间微妙的事实关联,从而造成真实性预测的偏差。为解决上述问题,我们提出WKGFC框架,该框架将权威开放知识图谱作为核心证据源。通过设计基于LLM的检索机制,系统能够评估声明内容并检索最相关的知识子图,构建用于事实核查的结构化证据。为增强知识图谱证据的完整性,我们同时检索网络内容进行补充。上述过程被实现为一个自动化的马尔可夫决策过程(MDP):由具备推理能力的LLM智能体根据当前证据与声明内容动态决策后续操作。为使MDP适应事实核查任务,我们采用提示优化技术对智能体LLM进行微调。