Large-scale language models (LLMs), such as ChatGPT, are capable of generating human-like responses for various downstream tasks, such as task-oriented dialogues and question answering. However, applying LLMs to medical domains remains challenging due to their inability to leverage domain-specific knowledge. In this study, we present the Large-scale Language Models Augmented with Medical Textbooks (LLM-AMT), which integrates authoritative medical textbooks as the cornerstone of its design, enhancing its proficiency in the specialized domain through plug-and-play modules, comprised of a Hybrid Textbook Retriever, supplemented by the Query Augmenter and the LLM Reader. Experimental evaluation on three open-domain medical question-answering tasks reveals a substantial enhancement in both the professionalism and accuracy of the LLM responses when utilizing LLM-AMT, exhibiting an improvement ranging from 11.4% to 13.2%. Despite being 100 times smaller, we found that medical textbooks as the retrieval corpus serves as a more valuable external knowledge source than Wikipedia in the medical domain. Our experiments show that textbook augmentation results in a performance improvement ranging from 9.7% to 12.2% over Wikipedia augmentation.
翻译:大规模语言模型(如ChatGPT)能够针对各种下游任务(例如面向任务的对话和问答)生成类人响应。然而,由于无法利用领域特定知识,将大语言模型应用于医学领域仍具挑战性。本研究提出基于医学教科书增强的大语言模型(LLM-AMT),其核心设计以权威医学教科书为基础,通过即插即用模块(包括混合教科书检索器、查询增强器和LLM阅读器)提升其在专业领域的处理能力。在三个开放域医学问答任务上的实验评估表明,使用LLM-AMT后,大语言模型响应的专业性和准确性均显著提升,增幅达11.4%至13.2%。尽管医学教科书语料库的规模小100倍,但我们发现其在医学领域作为检索语料库比维基百科更具价值的外部知识来源。实验显示,与维基百科增强相比,教科书增强带来的性能提升幅度为9.7%至12.2%。