Large Language Models (LLMs) have demonstrated a remarkable potential in medical knowledge acquisition and question-answering. However, LLMs can potentially hallucinate and yield factually incorrect outcomes, even with domain-specific pretraining. Previously, retrieval augmented generation (RAG) has limited success in addressing hallucinations. Unlike previous methods in RAG where the retrieval model was trained separately from the LLM, we introduce JMLR (for Jointly trains LLM and information Retrieval (IR)) during the fine-tuning phase. The synchronized training mechanism enhances JMLR's ability to retrieve clinical guidelines and leverage medical knowledge to reason and answer questions and reduces the demand for computational resources. We evaluated JMLR on the important medical question answering application. Our experimental results demonstrate that JMLR-13B (70.5%) outperforms a previous state-of-the-art open-source model using conventional pre-training and fine-tuning Meditron-70B (68.9%) and Llama2-13B with RAG (54.9%) on a medical question-answering dataset. JMLR-13B (148 GPU hours) also trains much faster than Meditron-70B (42630 GPU hours). Through this work, we provide a new and efficient knowledge enhancement tool for healthcare, demonstrating the potential of integrating IR and LLM training for medical question-answering systems.
翻译:大语言模型(LLM)在医学知识获取与问答领域展现出显著潜力。然而,即便经过领域特定预训练,LLM仍可能产生幻觉,导致事实性错误。先前,检索增强生成(RAG)在解决幻觉问题上效果有限。与先前RAG方法中检索模型与LLM分离训练不同,我们提出JMLR(联合训练大语言模型与信息检索),在微调阶段实现同步训练。该同步训练机制增强了JMLR检索临床指南、运用医学知识进行推理与回答问题的能力,同时降低了对计算资源的需求。我们在重要医学问答应用上对JMLR进行了评估。实验结果表明,在医学问答数据集上,JMLR-13B(70.5%)优于先前采用传统预训练与微调的开源先进模型Meditron-70B(68.9%)以及结合RAG的Llama2-13B(54.9%)。同时,JMLR-13B(148 GPU小时)的训练速度远超Meditron-70B(42630 GPU小时)。通过本研究,我们为医疗领域提供了一种高效的新型知识增强工具,展示了将信息检索与LLM训练相结合用于医学问答系统的潜力。