Large language models have shown remarkable language processing and reasoning ability but are prone to hallucinate when asked about private data. Retrieval-augmented generation (RAG) retrieves relevant data that fit into an LLM's context window and prompts the LLM for an answer. GraphRAG extends this approach to structured Knowledge Graphs (KGs) and questions regarding entities multiple hops away. The majority of recent GraphRAG methods either overlook the retrieval step or have ad hoc retrieval processes that are abstract or inefficient. This prevents them from being adopted when the KGs are stored in graph databases supporting graph query languages. In this work, we present GraphRAFT, a retrieve-and-reason framework that finetunes LLMs to generate provably correct Cypher queries to retrieve high-quality subgraph contexts and produce accurate answers. Our method is the first such solution that can be taken off-the-shelf and used on KGs stored in native graph DBs. Benchmarks suggest that our method is sample-efficient and scales with the availability of training data. Our method achieves significantly better results than all state-of-the-art models across all four standard metrics on two challenging Q\&As on large text-attributed KGs.
翻译:大型语言模型展现出卓越的语言处理与推理能力,但在涉及私有数据时容易产生幻觉。检索增强生成通过检索与问题相关的数据,将其置于LLM的上下文窗口内,并提示LLM生成答案。GraphRAG将这一方法扩展至结构化知识图谱以及涉及多跳实体的查询。近期大多数GraphRAG方法要么忽略了检索步骤,要么采用了抽象或低效的临时检索流程,这导致它们难以应用于支持图查询语言的图数据库存储的知识图谱。本文提出GraphRAFT,一个检索与推理框架,通过对LLM进行微调,使其能够生成可证明正确的Cypher查询,以检索高质量的子图上下文并生成准确答案。我们的方法是首个可直接应用于原生图数据库存储的知识图谱的即用型解决方案。基准测试表明,该方法具有样本高效性,且能随训练数据量的增加而扩展。在两个大型文本属性知识图谱的挑战性问答任务中,我们的方法在全部四项标准指标上均显著优于所有现有最先进模型。