Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering

Equipped with Chain-of-Thought (CoT), Large language models (LLMs) have shown impressive reasoning ability in various downstream tasks. Even so, suffering from hallucinations and the inability to access external knowledge, LLMs often come with incorrect or unfaithful intermediate reasoning steps, especially in the context of answering knowledge-intensive tasks such as KBQA. To alleviate this issue, we propose a framework called Knowledge-Driven Chain-of-Thought (KD-CoT) to verify and modify reasoning traces in CoT via interaction with external knowledge, and thus overcome the hallucinations and error propagation. Concretely, we formulate the CoT rationale process of LLMs into a structured multi-round QA format. In each round, LLMs interact with a QA system that retrieves external knowledge and produce faithful reasoning traces based on retrieved precise answers. The structured CoT reasoning of LLMs is facilitated by our developed KBQA CoT collection, which serves as in-context learning demonstrations and can also be utilized as feedback augmentation to train a robust retriever. Extensive experiments on WebQSP and ComplexWebQuestion datasets demonstrate the effectiveness of proposed KD-CoT in task-solving reasoning generation, which outperforms the vanilla CoT ICL with an absolute success rate of 8.0% and 5.1%. Furthermore, our proposed feedback-augmented retriever outperforms the state-of-the-art baselines for retrieving knowledge, achieving significant improvement in Hit and recall performance. Our code and data are released on https://github.com/AdelWang/KD-CoT/tree/main.

翻译：链式思维（CoT）赋予大语言模型（LLMs）在多种下游任务中展现出令人瞩目的推理能力。即便如此，由于存在幻觉现象且无法访问外部知识，LLMs 在回答知识密集型任务（如KBQA）时，常常伴随着不正确或不忠实的中间推理步骤。为缓解此问题，我们提出名为知识驱动链式思维（KD-CoT）的框架，通过与外部知识的交互来验证和修正CoT中的推理轨迹，从而克服幻觉与错误传播。具体而言，我们将LLMs的CoT推理过程形式化为结构化的多轮问答格式。在每一轮中，LLMs与一个检索外部知识的问答系统交互，并基于检索到的精确答案生成忠实的推理轨迹。LLMs的结构化CoT推理得益于我们开发的KBQA CoT数据集，该数据集既可作为上下文学习示例，也可用作反馈增强来训练稳健的检索器。在WebQSP和ComplexWebQuestion数据集上的大量实验表明，所提出的KD-CoT在任务求解推理生成中具有有效性，其绝对成功率分别比标准CoT ICL高出8.0%和5.1%。此外，我们提出的反馈增强检索器在知识检索方面超越了最先进的基线，命中率和召回率性能均获得显著提升。我们的代码和数据已在 https://github.com/AdelWang/KD-CoT/tree/main 发布。