Recent advancements in large language models (LLMs) have shown impressive versatility across various tasks. To eliminate its hallucinations, retrieval-augmented generation (RAG) has emerged as a powerful approach, leveraging external knowledge sources like knowledge graphs (KGs). In this paper, we study the task of KG-driven RAG and propose a novel Similar Graph Enhanced Retrieval-Augmented Generation (SimGRAG) method. It effectively addresses the challenge of aligning query texts and KG structures through a two-stage process: (1) query-to-pattern, which uses an LLM to transform queries into a desired graph pattern, and (2) pattern-to-subgraph, which quantifies the alignment between the pattern and candidate subgraphs using a graph semantic distance (GSD) metric. We also develop an optimized retrieval algorithm that efficiently identifies the top-$k$ subgraphs within 1-second latency on a 10-million-scale KG. Extensive experiments show that SimGRAG outperforms state-of-the-art KG-driven RAG methods in both question answering and fact verification, offering superior plug-and-play usability and scalability.
翻译:近年来,大型语言模型(LLM)在各种任务中展现出卓越的通用能力。为消除其幻觉现象,检索增强生成(RAG)已成为一种有效方法,其通过利用知识图谱(KG)等外部知识源来增强模型性能。本文研究KG驱动的RAG任务,并提出一种新颖的相似图增强检索增强生成(SimGRAG)方法。该方法通过两阶段流程有效解决了查询文本与KG结构对齐的挑战:(1)查询到模式阶段,使用LLM将查询转换为期望的图模式;(2)模式到子图阶段,通过图语义距离(GSD)度量量化模式与候选子图之间的对齐程度。我们还开发了一种优化的检索算法,可在千万级规模的KG上实现1秒延迟内高效检索top-$k$子图。大量实验表明,SimGRAG在问答和事实验证任务上均优于当前最先进的KG驱动RAG方法,同时具备优异的即插即用性和可扩展性。