Retrieval-Augmented Generation enhances language models by retrieving external knowledge to support informed and grounded responses. However, traditional RAG methods rely on fragment-level retrieval, limiting their ability to address query-focused summarization queries. GraphRAG introduces a graph-based paradigm for global knowledge reasoning, yet suffers from inefficiencies in information extraction, costly resource consumption, and poor adaptability to incremental updates. To overcome these limitations, we propose TagRAG, a tag-guided hierarchical knowledge graph RAG framework designed for efficient global reasoning and scalable graph maintenance. TagRAG introduces two key components: (1) Tag Knowledge Graph Construction, which extracts object tags and their relationships from documents and organizes them into hierarchical domain tag chains for structured knowledge representation, and (2) Tag-Guided Retrieval-Augmented Generation, which retrieves domain-centric tag chains to localize and synthesize relevant knowledge during inference. This design significantly adapts to smaller language models, improves retrieval granularity, and supports efficient knowledge increment. Extensive experiments on UltraDomain datasets spanning Agriculture, Computer Science, Law, and cross-domain settings demonstrate that TagRAG achieves an average winning rate of 78.36% against baselines while maintaining about 14.6x construction and 1.9x retrieval efficiency compared with GraphRAG.
翻译:检索增强生成通过检索外部知识来增强语言模型,以提供信息充分且基于事实的响应。然而,传统的RAG方法依赖于片段级检索,限制了其处理聚焦查询的摘要生成任务的能力。GraphRAG引入了基于图谱的全局知识推理范式,但存在信息提取效率低、资源消耗大以及对增量更新适应性差等问题。为克服这些局限,我们提出了TagRAG,一种标签引导的分层知识图谱RAG框架,旨在实现高效的全局推理和可扩展的图谱维护。TagRAG包含两个核心组件:(1)标签知识图谱构建,从文档中提取对象标签及其关系,并将其组织成层次化的领域标签链,以实现结构化的知识表示;(2)标签引导的检索增强生成,在推理过程中检索以领域为中心的标签链,以定位并综合相关知识。该设计显著提升了小规模语言模型的适应性,改善了检索粒度,并支持高效的知识增量。在涵盖农业、计算机科学、法律及跨领域设置的UltraDomain数据集上进行的大量实验表明,TagRAG相对于基线模型取得了平均78.36%的胜率,同时在构建效率上约为GraphRAG的14.6倍,检索效率约为1.9倍。