We present chain-of-knowledge (CoK), a novel framework that augments large language models (LLMs) by dynamically incorporating grounding information from heterogeneous sources. It results in more factual rationales and reduced hallucination in generation. Specifically, CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation. Given a knowledge-intensive question, CoK first prepares several preliminary rationales and answers while identifying the relevant knowledge domains. If there is no majority consensus among the answers from samples, CoK corrects the rationales step by step by adapting knowledge from the identified domains. These corrected rationales can plausibly serve as a better foundation for the final answer consolidation. Unlike prior studies that primarily use unstructured data, CoK also leverages structured knowledge sources such as Wikidata and tables that provide more reliable factual information. To access both unstructured and structured knowledge sources in the dynamic knowledge adapting stage, we propose an adaptive query generator that allows the generation of queries for various types of query languages, including SPARQL, SQL, and natural sentences. Moreover, to minimize error propagation between rationales, CoK corrects the rationales progressively using preceding corrected rationales to generate and correct subsequent rationales. Extensive experiments show that CoK consistently improves the performance of LLMs on knowledge-intensive tasks across different domains.
翻译:我们提出Chain-of-Knowledge(CoK),一种通过动态整合异构源事实信息来增强大型语言模型(LLMs)的新型框架。该框架能生成更符合事实的推理链条,并减少生成过程中的幻觉现象。具体而言,CoK包含三个阶段:推理准备、动态知识适配和答案整合。针对知识密集型问题,CoK首先在识别相关知识领域的同时,生成若干初步推理链和候选答案。若样本生成的答案未形成多数一致共识,CoK会通过从已识别领域适配知识,逐步修正推理链。修正后的推理链可作为最终答案整合的更可靠基础。与主要依赖非结构化数据的现有研究不同,CoK同时利用维基数据(Wikidata)和表格等结构化知识源,以提供更可靠的事实信息。为在动态知识适配阶段访问非结构化和结构化知识源,我们提出自适应查询生成器,支持生成包括SPARQL、SQL和自然语句在内的多种查询语言指令。此外,为最小化推理链间的错误传播,CoK采用递进式修正策略,利用先前已修正的推理链来生成和修正后续推理链。大量实验表明,CoK能在不同领域的知识密集型任务中持续提升LLMs的性能。