Large Language Models (LLMs) have been driving progress in AI at an unprecedented rate, yet still face challenges in knowledge-intensive domains like biomedicine. Solutions such as pre-training and domain-specific fine-tuning add substantial computational overhead, and the latter require domain-expertise. External knowledge infusion is task-specific and requires model training. Here, we introduce a task-agnostic Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging the massive biomedical KG SPOKE with LLMs such as Llama-2-13b, GPT-3.5-Turbo and GPT-4, to generate meaningful biomedical text rooted in established knowledge. KG-RAG consistently enhanced the performance of LLMs across various prompt types, including one-hop and two-hop prompts, drug repurposing queries, biomedical true/false questions, and multiple-choice questions (MCQ). Notably, KG-RAG provides a remarkable 71% boost in the performance of the Llama-2 model on the challenging MCQ dataset, demonstrating the framework's capacity to empower open-source models with fewer parameters for domain-specific questions. Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as GPT-3.5 which exhibited improvement over GPT-4 in context utilization on MCQ data. Our approach was also able to address drug repurposing questions, returning meaningful repurposing suggestions. In summary, the proposed framework combines explicit and implicit knowledge of KG and LLM, respectively, in an optimized fashion, thus enhancing the adaptability of general-purpose LLMs to tackle domain-specific questions in a unified framework.
翻译:大语言模型(LLMs)正以前所未有的速度推动人工智能发展,但在生物医学等知识密集型领域仍面临挑战。预训练和领域特定微调等方案会带来大量计算开销,且后者需要领域专家参与。外部知识注入则具有任务特异性,并需要模型训练。本文提出了一种任务无关的基于知识图谱的检索增强生成(KG-RAG)框架,通过将大规模生物医学知识图谱SPOKE与Llama-2-13b、GPT-3.5-Turbo和GPT-4等大语言模型结合,生成基于已有知识的生物医学文本。KG-RAG在各类提示类型(包括一跳和两跳提示、药物重定位查询、生物医学真假判断及多项选择题中)均能持续提升大语言模型性能。值得注意的是,在具有挑战性的多项选择数据集上,KG-RAG使Llama-2模型性能提升71%,显示出该框架能赋能参数更少的开源模型解决领域特定问题。此外,KG-RAG还增强了GPT-3.5等专有模型的性能——在多项选择数据的上下文利用方面,GPT-3.5的表现甚至超越GPT-4。本方法还能有效处理药物重定位问题,返回有意义的药物重定位建议。总之,该框架以优化方式分别融合了知识图谱的显性知识与大语言模型的隐性知识,从而增强了通用大语言模型在统一框架内解决领域特定问题的适应能力。