Owing to their unprecedented comprehension capabilities, large language models (LLMs) have become indispensable components of modern web search engines. From a technical perspective, this integration represents retrieval-augmented generation (RAG), which enhances LLMs by grounding them in external knowledge bases. A prevalent technical approach in this context is graph-based RAG (G-RAG). However, current G-RAG methodologies frequently underutilize graph topology, predominantly focusing on low-order structures or pre-computed static communities. This limitation affects their effectiveness in addressing dynamic and complex queries. Thus, we propose DA-RAG, which leverages attributed community search (ACS) to extract relevant subgraphs based on the queried question dynamically. DA-RAG captures high-order graph structures, allowing for the retrieval of self-complementary knowledge. Furthermore, DA-RAG is equipped with a chunk-layer oriented graph index, which facilitates efficient multi-granularity retrieval while significantly reducing both computational and economic costs. We evaluate DA-RAG on multiple datasets, demonstrating that it outperforms existing RAG methods by up to 40% in head-to-head comparisons across four metrics while reducing index construction time and token overhead by up to 37% and 41%, respectively.
翻译:得益于其前所未有的理解能力,大型语言模型(LLM)已成为现代网络搜索引擎不可或缺的组成部分。从技术角度看,这种集成代表了检索增强生成(RAG),它通过将LLM建立在外部知识库的基础上来增强其能力。在此背景下,一种普遍的技术方法是基于图的RAG(G-RAG)。然而,当前的G-RAG方法常常未能充分利用图拓扑结构,主要关注低阶结构或预计算的静态社区。这一局限性影响了其处理动态复杂查询的有效性。因此,我们提出了DA-RAG,它利用属性社区搜索(ACS)根据查询问题动态提取相关子图。DA-RAG能够捕获高阶图结构,从而检索自互补的知识。此外,DA-RAG配备了一个面向块层的图索引,这有助于实现高效的多粒度检索,同时显著降低计算和经济成本。我们在多个数据集上评估了DA-RAG,结果表明,在四项指标的正面比较中,其性能优于现有RAG方法高达40%,同时将索引构建时间和令牌开销分别降低了高达37%和41%。