Leveraging Spreading Activation for Improved Document Retrieval in Knowledge-Graph-Based RAG Systems

Despite initial successes and a variety of architectures, retrieval-augmented generation systems still struggle to reliably retrieve and connect the multi-step evidence required for complicated reasoning tasks. Most of the standard RAG frameworks regard all retrieved information as equally reliable, overlooking the varying credibility and interconnected nature of large textual corpora. GraphRAG approaches offer potential improvement to RAG systems by integrating knowledge graphs, which structure information into nodes and edges, capture entity relationships, and enable multi-step logical traversal. However, GraphRAG is not always an ideal solution, as it depends on high-quality graph representations of the corpus. Such representations usually rely on manually curated knowledge graphs, which are costly to construct and update, or on automated graph-construction pipelines that are often unreliable. Moreover, systems following this paradigm typically use large language models to guide graph traversal and evidence retrieval. In this paper, we propose a novel RAG framework that uses a spreading activation algorithm to retrieve information from a corpus of documents connected by an automatically constructed heterogeneous knowledge graph. This approach reduces reliance on semantic knowledge graphs, which are often incomplete due to information loss during information extraction, avoids LLM-guided graph traversal, and improves performance on multi-hop question answering. Experiments show that our method achieves better or comparable performance to several state-of-the-art RAG methods and can be integrated as a plug-and-play module with different iterative RAG pipelines. When combined with chain-of-thought iterative retrieval, it yields up to a 39% absolute improvement in answer correctness over naive RAG, while achieving these results with small open-weight language models.

翻译：尽管检索增强生成系统已取得初步成功并发展出多种架构，其在复杂推理任务中仍难以可靠地检索并串联多步骤证据。多数标准RAG框架将所有检索信息视为同等可靠，忽视了大规模文本语料库中可信度的差异性与知识间的内在关联。GraphRAG方法通过整合知识图谱为RAG系统提供了改进潜力——知识图谱将信息结构化为节点与边，能捕捉实体关系并支持多步逻辑遍历。然而GraphRAG并非总是理想解决方案，因其依赖于高质量的语料图谱表示。此类表示通常需要依赖构建与更新成本高昂的人工编纂知识图谱，或采用往往不可靠的自动化图谱构建流程。此外，遵循此范式的系统通常使用大语言模型指导图谱遍历与证据检索。本文提出一种新型RAG框架，通过传播激活算法从自动构建的异质知识图谱连接的文档语料库中检索信息。该方法降低了对语义知识图谱的依赖（这类图谱常因信息抽取过程中的信息损失而不完整），避免了LLM引导的图谱遍历，并在多跳问答任务中提升了性能。实验表明，本方法在多项性能指标上优于或媲美多种前沿RAG方法，并可作为即插即用模块与不同迭代式RAG流程集成。当与思维链迭代检索结合时，相较于朴素RAG实现了高达39%的答案正确率绝对提升，且这些成果是通过小型开源权重语言模型实现的。