Recent advances in graph learning have paved the way for innovative retrieval-augmented generation (RAG) systems that leverage the inherent relational structures in graph data. However, many existing approaches suffer from rigid, fixed settings and significant engineering overhead, limiting their adaptability and scalability. Additionally, the RAG community has largely overlooked the decades of research in the graph database community regarding the efficient retrieval of interesting substructures on large-scale graphs. In this work, we introduce the RAG-on-Graphs Library (RGL), a modular framework that seamlessly integrates the complete RAG pipeline-from efficient graph indexing and dynamic node retrieval to subgraph construction, tokenization, and final generation-into a unified system. RGL addresses key challenges by supporting a variety of graph formats and integrating optimized implementations for essential components, achieving speedups of up to 143x compared to conventional methods. Moreover, its flexible utilities, such as dynamic node filtering, allow for rapid extraction of pertinent subgraphs while reducing token consumption. Our extensive evaluations demonstrate that RGL not only accelerates the prototyping process but also enhances the performance and applicability of graph-based RAG systems across a range of tasks.
翻译:图学习的最新进展为创新的检索增强生成(RAG)系统铺平了道路,这些系统利用图数据中固有的关系结构。然而,许多现有方法存在设置僵化、固定以及工程开销大的问题,限制了其适应性和可扩展性。此外,RAG 社区在很大程度上忽视了图数据库领域数十年来关于大规模图上有趣子结构高效检索的研究。本文介绍了 RAG-on-Graphs 库(RGL),这是一个模块化框架,它将完整的 RAG 流程——从高效的图索引和动态节点检索,到子图构建、标记化和最终生成——无缝集成到一个统一的系统中。RGL 通过支持多种图格式并集成关键组件的优化实现来解决核心挑战,相比传统方法实现了高达 143 倍的加速。此外,其灵活的实用工具(如动态节点过滤)能够快速提取相关子图,同时减少标记消耗。我们广泛的评估表明,RGL 不仅加速了原型开发过程,而且在一系列任务中提升了基于图的 RAG 系统的性能和适用性。