Large Language Models (LLMs) have demonstrated a remarkable potential in medical knowledge acquisition and question-answering. However, LLMs can potentially hallucinate and yield factually incorrect outcomes, even with domain-specific pretraining. Previously, retrieval augmented generation (RAG) has limited success in addressing hallucinations. Unlike previous methods in RAG where the retrieval model was trained separately from the LLM, we introduce JMLR (for Jointly trains LLM and information Retrieval) during the fine-tuning phase. The synchronized training mechanism enhances JMLR's ability to retrieve clinical guidelines and leverage medical knowledge to reason and answer questions and reduces the demand for computational resources. We evaluated JMLR on the important medical question-answering application. Our experimental results demonstrate that JMLR-13B (70.5%) outperforms a previous state-of-the-art open-source model using conventional pre-training and fine-tuning Meditron-70B (68.9%) and Llama2-13B with RAG (67.7%) on a medical question-answering dataset. Comprehensive evaluations reveal JMLR-13B enhances reasoning quality and reduces hallucinations better than Claude3-Opus. Additionally, JMLR-13B (148 GPU hours) also trains much faster than Meditron-70B (42630 GPU hours). Through this work, we provide a new and efficient knowledge enhancement method for healthcare, demonstrating the potential of integrating retrieval and LLM training for medical question-answering systems.
翻译:大语言模型(LLM)在医学知识获取与问答方面展现出显著潜力。然而,即使经过领域特定的预训练,LLM仍可能产生幻觉并导致事实性错误。以往,检索增强生成(RAG)在解决幻觉问题上成效有限。与先前RAG方法中检索模型独立于LLM进行训练不同,我们在微调阶段引入了JMLR(联合训练LLM与信息检索)。该同步训练机制增强了JMLR检索临床指南并利用医学知识进行推理与问答的能力,同时降低了对计算资源的需求。我们在重要的医学问答应用上评估了JMLR。实验结果表明,在医学问答数据集上,JMLR-13B(70.5%)的表现优于采用传统预训练与微调的先前最优开源模型Meditron-70B(68.9%)以及结合RAG的Llama2-13B(67.7%)。综合评估显示,JMLR-13B在提升推理质量与减少幻觉方面优于Claude3-Opus。此外,JMLR-13B(148 GPU小时)的训练速度也远快于Meditron-70B(42630 GPU小时)。通过本工作,我们为医疗领域提供了一种新颖高效的知识增强方法,展现了检索与LLM训练相结合在医学问答系统中的潜力。