Retrieval-augmented generation (RAG) has shown some success in augmenting large language models (LLMs) with external knowledge. However, as a non-parametric knowledge integration paradigm for LLMs, RAG methods heavily rely on external retrieval modules and the retrieved textual context prior. Especially for very large scale knowledge augmentation, they would introduce substantial inference latency due to expensive searches and much longer relevant context. In this paper, we propose a parametric knowledge integration method, called \textbf{AtlasKV}, a scalable, effective, and general way to augment LLMs with billion-scale knowledge graphs (KGs) (e.g. 1B triples) using very little GPU memory cost (e.g. less than 20GB VRAM). In AtlasKV, we introduce KG2KV and HiKVP to integrate KG triples into LLMs at scale with sub-linear time and memory complexity. It maintains strong knowledge grounding and generalization performance using the LLMs' inherent attention mechanism, and requires no external retrievers, long context priors, or retraining when adapting to new knowledge.
翻译:检索增强生成(RAG)在利用外部知识增强大语言模型方面已取得一定成功。然而,作为大语言模型的非参数化知识集成范式,RAG方法严重依赖外部检索模块及所检索的文本上下文先验信息。尤其在超大规模知识增强场景下,由于昂贵的检索过程和显著增长的上下文长度,该方法会引入大量推理延迟。本文提出一种参数化知识集成方法——\textbf{AtlasKV},这是一种可扩展、高效且通用的方案,能以极低的GPU显存开销(如低于20GB显存)为LLM注入十亿级知识图谱(如10亿三元组)。在AtlasKV中,我们引入KG2KV和HiKVP两项技术,以亚线性的时间和空间复杂度将知识图谱三元组规模化集成至LLM。该方法利用LLM固有的注意力机制保持强大的知识锚定与泛化性能,且在适应新知识时无需外部检索器、长上下文先验或重新训练。