We present chain-of-knowledge (CoK), a novel framework that augments large language models (LLMs) by dynamically incorporating grounding information from heterogeneous sources. It results in more factual rationales and reduced hallucination in generation. Specifically, CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation. Given a knowledge-intensive question, CoK first prepares several preliminary rationales and answers while identifying the relevant knowledge domains. If there is no majority consensus among the answers from samples, CoK corrects the rationales step by step by adapting knowledge from the identified domains. These corrected rationales can plausibly serve as a better foundation for the final answer consolidation. Unlike prior studies that primarily use unstructured data, CoK also leverages structured knowledge sources such as Wikidata and tables that provide more reliable factual information. To access both unstructured and structured knowledge sources in the dynamic knowledge adapting stage, we propose an adaptive query generator that allows the generation of queries for various types of query languages, including SPARQL, SQL, and natural sentences. Moreover, to minimize error propagation between rationales, CoK corrects the rationales progressively using preceding corrected rationales to generate and correct subsequent rationales. Extensive experiments show that CoK consistently improves the performance of LLMs on knowledge-intensive tasks across different domains.
翻译:我们提出了链式知识(Chain-of-Knowledge, CoK),一种通过动态整合异构源中的依据信息来增强大型语言模型(LLMs)的新型框架。该框架能生成更具事实依据的推理链,并减少生成过程中的幻觉现象。具体而言,CoK包含三个主要阶段:推理准备、动态知识适配和答案整合。面对知识密集型问题时,CoK首先准备若干初步推理链及候选答案,同时识别相关知识领域。若样本答案间未形成多数共识,CoK将通过从已识别领域适配知识,逐步修正推理链。这些修正后的推理链可作为最终答案整合的更可靠基础。与主要依赖非结构化数据的既有研究不同,CoK还利用维基数据(Wikidata)和表格等结构化知识源以获取更可靠的事实信息。为在动态知识适配阶段同时访问非结构化与结构化知识源,我们提出一种自适应查询生成器,可针对SPARQL、SQL及自然语句等各类查询语言生成对应查询。此外,为最小化推理链间的错误传播,CoK采用渐进式修正策略:利用已修正的先前推理链生成并修正后续推理链。大量实验表明,CoK在不同领域的知识密集型任务中均能显著提升LLMs的性能表现。