Me LLaMA: Foundation Large Language Models for Medical Applications

Qianqian Xie,Qingyu Chen,Aokun Chen,Cheng Peng,Yan Hu,Fongci Lin,Xueqing Peng,Jimin Huang,Jeffrey Zhang,Vipina Keloth,Xinyu Zhou,Huan He,Lucila Ohno-Machado,Yonghui Wu,Hua Xu,Jiang Bian

from arxiv, 21 pages, 3 figures, 8 tables

Recent advancements in large language models (LLMs) such as ChatGPT and LLaMA have hinted at their potential to revolutionize medical applications, yet their application in clinical settings often reveals limitations due to a lack of specialized training on medical-specific data. In response to this challenge, this study introduces Me-LLaMA, a novel medical LLM family that includes foundation models - Me-LLaMA 13/70B, along with their chat-enhanced versions - Me-LLaMA 13/70B-chat, developed through continual pre-training and instruction tuning of LLaMA2 using large medical datasets. Our methodology leverages a comprehensive domain-specific data suite, including a large-scale, continual pre-training dataset with 129B tokens, an instruction tuning dataset with 214k samples, and a new medical evaluation benchmark (MIBE) across six critical medical tasks with 12 datasets. Our extensive evaluation using the MIBE shows that Me-LLaMA models achieve overall better performance than existing open-source medical LLMs in zero-shot, few-shot and supervised learning abilities. With task-specific instruction tuning, Me-LLaMA models outperform ChatGPT on 7 out of 8 datasets and GPT-4 on 5 out of 8 datasets. In addition, we investigated the catastrophic forgetting problem, and our results show that Me-LLaMA models outperform other open-source medical LLMs in mitigating this issue. Me-LLaMA is one of the largest open-source medical foundation LLMs that use both biomedical and clinical data. It exhibits superior performance across both general and medical tasks compared to other open-source medical LLMs, rendering it an attractive choice for medical AI applications. We release our models, datasets, and evaluation scripts at: https://github.com/BIDS-Xu-Lab/Me-LLaMA.

翻译：摘要：ChatGPT和LLaMA等大语言模型的最新进展已暗示其有望革新医疗应用，但在实际临床场景中，由于缺乏针对医疗特定数据的专业训练，这些模型往往显现出局限性。针对这一挑战，本研究提出Me-LLaMA——一个新颖的医疗大语言模型系列，包含基础模型Me-LLaMA 13/70B及其对话增强版本Me-LLaMA 13/70B-Chat。这些模型基于LLaMA2，通过持续预训练和指令微调，利用大规模医疗数据集开发完成。我们的方法整合了一套全面的领域专用数据套件，包括包含1290亿个token的大规模持续预训练数据集、包含21.4万条样本的指令微调数据集，以及涵盖6项关键医疗任务（涉及12个数据集）的新型医疗评估基准（MIBE）。利用MIBE进行的广泛评估显示，Me-LLaMA模型在零样本、少样本和监督学习能力上均优于现有开源医疗大语言模型。经过任务特定指令微调后，Me-LLaMA模型在8个数据集中有7个超越ChatGPT，在8个数据集中有5个超越GPT-4。此外，我们研究了灾难性遗忘问题，结果表明Me-LLaMA模型在缓解该问题上优于其他开源医疗大语言模型。Me-LLaMA是同时使用生物医学和临床数据的最大开源医疗基础大语言模型之一。相较于其他开源医疗大语言模型，它在通用任务和医疗任务上均展现出卓越性能，成为医疗AI应用的理想选择。我们已在以下地址发布模型、数据集和评估脚本：https://github.com/BIDS-Xu-Lab/Me-LLaMA。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日