Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on studying Alzheimer's Disease (AD), a specialized sub-field in biomedicine and a global health priority. With a synergized framework of LLM and KG mutually enhancing each other, we first leverage LLM to construct an evolving AD-specific knowledge graph (KG) sourced from AD-related scientific literature, and then we utilize a coarse-to-fine sampling method with a novel self-aware knowledge retrieval approach to select appropriate knowledge from the KG to augment LLM inference capabilities. The experimental results, conducted on our constructed AD question answering (ADQA) benchmark, underscore the efficacy of DALK. Additionally, we perform a series of detailed analyses that can offer valuable insights and guidelines for the emerging topic of mutually enhancing KG and LLM. We will release the code and data at https://github.com/David-Li0406/DALK.
翻译:近年来,大语言模型(LLM)在各类应用中取得了显著成效。然而,长尾知识的整合难题持续阻碍着LLM在专业领域的无缝应用。本研究提出DALK(即LLM与知识图谱的动态协同增强框架),以攻克这一局限,并展示了其在阿尔茨海默病(AD)研究中的能力——AD是生物医学领域的专业分支,也是全球健康优先议题。通过构建LLM与知识图谱相互增强的协同框架,我们首先利用LLM从AD相关科学文献中构建持续演进的AD专属知识图谱(KG),进而采用粗粒度到细粒度的采样方法,结合新型自感知知识检索策略,从知识图谱中选取恰当知识以增强LLM的推理能力。在我们构建的AD问答基准(ADQA)上开展的实验结果证实了DALK的有效性。此外,我们进行了一系列深入分析,可为“知识图谱与LLM相互增强”这一新兴课题提供宝贵洞见与指导。相关代码与数据将在https://github.com/David-Li0406/DALK 开源。