From Beginner to Expert: Modeling Medical Knowledge into General LLMs

Qiang Li,Xiaoyan Yang,Haowen Wang,Qin Wang,Lei Liu,Junjie Wang,Yang Zhang,Mingyuan Chu,Sen Hu,Yicheng Chen,Yue Shen,Cong Fan,Wangshu Zhang,Teng Xu,Jinjie Gu,Jing Zheng,Guannan Zhang Ant Group

from arxiv, Developed by Ant Group for PubMedQA leaderboard

Recently, large language model (LLM) based artificial intelligence (AI) systems have demonstrated remarkable capabilities in natural language understanding and generation. However, these models face a significant challenge when it comes to sensitive applications, such as reasoning over medical knowledge and answering medical questions in a physician-like manner. Prior studies attempted to overcome this challenge by increasing the model size (>100B) to learn more general medical knowledge, while there is still room for improvement in LLMs with smaller-scale model sizes (<100B). In this work, we start from a pre-trained general LLM model (AntGLM-10B) and fine-tune it from a medical beginner towards a medical expert (called AntGLM-Med-10B), which leverages a 3-stage optimization procedure, i.e., general medical knowledge injection, medical domain instruction tuning, and specific medical task adaptation. Our contributions are threefold: (1) We specifically investigate how to adapt a pre-trained general LLM in medical domain, especially for a specific medical task. (2) We collect and construct large-scale medical datasets for each stage of the optimization process. These datasets encompass various data types and tasks, such as question-answering, medical reasoning, multi-choice questions, and medical conversations. (3) Specifically for multi-choice questions in the medical domain, we propose a novel Verification-of-Choice approach for prompting engineering, which significantly enhances the reasoning ability of LLMs. Remarkably, by combining the above approaches, our AntGLM-Med-10B model can outperform the most of LLMs on PubMedQA, including both general and medical LLMs, even when these LLMs have larger model size.

翻译：近期，基于大语言模型（LLM）的人工智能系统在自然语言理解和生成方面展现出卓越能力。然而，这些模型在医疗推理、以医师风格回答医学问题等敏感应用场景中仍面临显著挑战。现有研究尝试通过扩大模型规模（超过100B参数）来学习更广泛的医学知识，但规模较小的LLM（小于100B参数）仍有提升空间。本研究从预训练的通用LLM模型（AntGLM-10B）出发，通过三阶段优化流程（通用医学知识注入、医学领域指令微调、特定医学任务适配）将其从医学初学者逐步训练为医学专家（称AntGLM-Med-10B）。我们的贡献体现在三个方面：（1）系统研究了如何将预训练通用LLM适配至医学领域，特别是针对特定医学任务；（2）为优化流程各阶段收集并构建了大规模医学数据集，涵盖问答、医学推理、多选题及医学对话等多种数据类型与任务；（3）针对医学领域多选题，创新提出基于验证选择（Verification-of-Choice）的提示工程方法，显著提升了LLM的推理能力。值得关注的是，通过整合上述方法，我们的AntGLM-Med-10B模型在PubMedQA基准测试中超越了包括更大规模通用和医学LLM在内的大多数模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/