Retrieval-Augmented Generation (RAG) offers an effective approach for large language models to access external knowledge. However, existing methods rely on dense similarity retrieval and face inherent limitations in handling structured constraints and multi-hop reasoning. Incorporating knowledge graphs partially alleviates these issues, but at the cost of semantic fragmentation, high maintenance overhead, and difficult incremental updates. This paper introduces SAG (SQLRetrieval Augmented Generation), a structured architecture for retrieval and agent systems. Instead of pre-building a global static graph, SAG converts each chunk into one semantically complete event and a set of indexing entities, then uses SQL join queries to dynamically link events that share entities into local hyperedges,constructing, at query time, a dynamically instantiated local index structure. This design avoids the need for global graph rebuilding and ongoing maintenance; the system naturally supports incremental writes, concurrent processing, and continuous scaling through its reliance on standard database infrastructure. Across HotpotQA, 2WikiMultiHop, and MuSiQue, three standard multi-hop benchmarks,SAG achieves the best results on 8 out of 9 Recall@K metrics, reaching 80.0% Recall@5 on MuSiQue, the benchmark with the highest multi-hop reasoning demands.SAG has also been deployed at a production scale of hundreds of millions of data items, with online retrieval latency kept within seconds. Project site and code are available at https://github.com/Zleap-AI/SAG-Benchmark.
翻译:检索增强生成(RAG)为大语言模型访问外部知识提供了有效方法。然而,现有方法依赖稠密相似性检索,在处理结构化约束和多跳推理时存在固有限制。引入知识图谱可部分缓解这些问题,但会带来语义碎片化、高维护开销以及难以实现增量更新等成本。本文提出SAG(SQL检索增强生成)——一种面向检索与智能体系统的结构化架构。SAG不预先构建全局静态图,而是将每个文本块转化为一个语义完整的事件和一组索引实体,继而通过SQL连接查询将共享实体的事件动态链接为局部超边,在查询时构建动态实例化的局部索引结构。该设计避免了全局图重建和持续维护的需求;系统凭借对标准数据库基础设施的依赖,天然支持增量写入、并发处理与持续扩展。在HotpotQA、2WikiMultiHop和MuSiQue三个标准多跳推理基准测试中,SAG在9个Recall@K指标中的8个上取得最优结果,在多跳推理需求最高的MuSiQue上达到80.0%的Recall@5。SAG已在数亿级数据规模的生产环境中部署,在线检索延迟控制在秒级以内。项目主页与代码见https://github.com/Zleap-AI/SAG-Benchmark。