Personalized Federated Learning (pFL) has emerged as a promising solution to tackle data heterogeneity across clients in FL. However, existing pFL methods either (1) introduce high communication and computation costs or (2) overfit to local data, which can be limited in scope, and are vulnerable to evolved test samples with natural shifts. In this paper, we propose PerAda, a parameter-efficient pFL framework that reduces communication and computational costs and exhibits superior generalization performance, especially under test-time distribution shifts. PerAda reduces the costs by leveraging the power of pretrained models and only updates and communicates a small number of additional parameters from adapters. PerAda has good generalization since it regularizes each client's personalized adapter with a global adapter, while the global adapter uses knowledge distillation to aggregate generalized information from all clients. Theoretically, we provide generalization bounds to explain why PerAda improves generalization, and we prove its convergence to stationary points under non-convex settings. Empirically, PerAda demonstrates competitive personalized performance (+4.85% on CheXpert) and enables better out-of-distribution generalization (+5.23% on CIFAR-10-C) on different datasets across natural and medical domains compared with baselines, while only updating 12.6% of parameters per model based on the adapter.
翻译:个性化联邦学习(pFL)已成为解决联邦学习中跨客户端数据异构性问题的一种有前景的方案。然而,现有pFL方法要么(1)引入高昂的通信与计算成本,要么(2)对局部数据产生过拟合(此类数据范围可能有限),且易受自然漂移演化的测试样本影响。本文提出PerAda,一种参数高效的pFL框架,在降低通信与计算成本的同时展现出卓越的泛化性能,尤其在测试时分布偏移场景下表现突出。PerAda通过利用预训练模型的能力,仅更新和通信适配器中的少量附加参数来降低成本。PerAda具有良好的泛化性,因其利用全局适配器对每个客户端的个性化适配器进行正则化,而全局适配器则通过知识蒸馏聚合所有客户端的泛化信息。理论方面,我们提供泛化界以解释PerAda提升泛化的原因,并证明其在非凸设置下收敛至平稳点。实证方面,PerAda在自然与医学领域的不同数据集上展现出更具竞争力的个性化性能(CheXpert上提升4.85%)和更优的分布外泛化能力(CIFAR-10-C上提升5.23%),同时基于适配器仅更新模型参数的12.6%。