Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on studying Alzheimer's Disease (AD), a specialized sub-field in biomedicine and a global health priority. With a synergized framework of LLM and KG mutually enhancing each other, we first leverage LLM to construct an evolving AD-specific knowledge graph (KG) sourced from AD-related scientific literature, and then we utilize a coarse-to-fine sampling method with a novel self-aware knowledge retrieval approach to select appropriate knowledge from the KG to augment LLM inference capabilities. The experimental results, conducted on our constructed AD question answering (ADQA) benchmark, underscore the efficacy of DALK. Additionally, we perform a series of detailed analyses that can offer valuable insights and guidelines for the emerging topic of mutually enhancing KG and LLM. We will release the code and data at https://github.com/David-Li0406/DALK.
翻译:近期,大型语言模型(LLM)在各类应用中展现出显著性能。然而,长尾知识整合的持续挑战仍阻碍着LLM在专业领域的无缝应用。本研究提出DALK(即LLM与知识图谱动态协同增强方法),以突破这一局限,并论证其在阿尔茨海默病(AD)——这一生物医学专业领域及全球健康优先课题——中的研究能力。通过构建LLM与知识图谱(KG)相互增强的协同框架,我们首先利用LLM从AD相关科学文献中构建动态演化的AD专属知识图谱;继而采用粗到细的采样方法,结合新型自感知知识检索策略,从知识图谱中筛选适宜知识以增强LLM推理能力。在我们构建的AD问答基准(ADQA)上开展的实验结果表明,DALK方法具有显著有效性。此外,我们通过系列深度分析,为知识图谱与LLM相互增强这一新兴课题提供了宝贵见解与实践指南。代码与数据将于https://github.com/David-Li0406/DALK 开源。