A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine

Anran Li,Yuanyuan Chen,Wenjun Long,Yu Yin,Yan Hu,Hyunjae Kim,Weipeng Zhou,Yujia Zhou,Hongyi Peng,Yang Ren,Xuguang Ai,Zhenyue Qin,Ming Hu,Xiaoxiao Li,Han Yu,Yih-Chung Tham,Lucila Ohno-Machado,Hua Xu,Qingyu Chen

from arxiv, 38 pages, 9 tables, 3 figures

Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis. To enable their use in clinical settings, LLMs are typically further adapted through continued pretraining or post-training using clinical data. However, most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems. Federated learning (FL) is a promising solution for enabling collaborative model development across healthcare institutions. Yet applying FL to LLMs in medicine remains fundamentally limited. First, conventional FL requires transmitting the full model during each communication round, which becomes impractical for multi-billion-parameter LLMs given the limited computational resources. Second, many FL algorithms implicitly assume data homogeneity, whereas real-world clinical data are highly heterogeneous across patients, diseases, and institutional practices. We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications. Fed-MedLoRA transmits only low-rank adapter parameters, reducing communication and computation overhead, while Fed-MedLoRA+ further incorporates adaptive, data-aware aggregation to improve convergence under cross-site heterogeneity. We apply the framework to clinical information extraction (IE), which transforms patient narratives into structured medical entities and relations. Accuracy was assessed across five patient cohorts through comparisons with BERT models, and LLaMA-3 and DeepSeek-R1, GPT-4o models. Evaluation settings included (1) in-domain training and testing, (2) external validation on independent cohorts, and (3) a low-resource new-site adaptation scenario using real-world clinical notes from the Yale New Haven Health System.

翻译：大型语言模型（LLM）在医学基准测试（包括问答和诊断）中已展现出强大性能。为使其适用于临床环境，通常需利用临床数据通过持续预训练或后训练对LLM进行进一步适配。然而，大多数医学LLM仅基于单一机构的数据进行训练，在异构系统中的泛化能力和安全性存在局限。联邦学习（FL）为跨医疗机构协同模型开发提供了前景广阔的解决方案。但将FL应用于医学LLM仍存在根本性限制。首先，传统FL要求每轮通信传输完整模型，对于参数规模达数十亿的LLM而言，在有限计算资源下变得不切实际。其次，许多FL算法隐含假设数据同质性，而现实世界临床数据在患者、疾病和机构实践方面具有高度异质性。我们提出了一种模型无关且参数高效的联邦学习框架，用于将LLM适配至医学应用。Fed-MedLoRA仅传输低秩适配器参数，降低了通信与计算开销；而Fed-MedLoRA+进一步融合了自适应的数据感知聚合机制，以提升跨站点异质性下的收敛性能。我们将该框架应用于临床信息抽取（IE）任务，该任务旨在将患者叙述文本转化为结构化医学实体与关系。通过对比BERT模型、LLaMA-3、DeepSeek-R1及GPT-4o模型，在五个患者队列中评估了准确性。评估设置包括：（1）域内训练与测试，（2）独立队列的外部验证，以及（3）利用耶鲁纽黑文卫生系统真实世界临床病历的低资源新站点适配场景。