Personalized Federated Learning (pFL) has emerged as a promising solution to tackle data heterogeneity across clients in FL. However, existing pFL methods either (1) introduce high communication and computation costs or (2) overfit to local data, which can be limited in scope, and are vulnerable to evolved test samples with natural shifts. In this paper, we propose PerAda, a parameter-efficient pFL framework that reduces communication and computational costs and exhibits superior generalization performance, especially under test-time distribution shifts. PerAda reduces the costs by leveraging the power of pretrained models and only updates and communicates a small number of additional parameters from adapters. PerAda has good generalization since it regularizes each client's personalized adapter with a global adapter, while the global adapter uses knowledge distillation to aggregate generalized information from all clients. Theoretically, we provide generalization bounds to explain why PerAda improves generalization, and we prove its convergence to stationary points under non-convex settings. Empirically, PerAda demonstrates competitive personalized performance (+4.85% on CheXpert) and enables better out-of-distribution generalization (+5.23% on CIFAR-10-C) on different datasets across natural and medical domains compared with baselines, while only updating 12.6% of parameters per model based on the adapter. Our code is available at https://github.com/NVlabs/PerAda.
翻译:个性化联邦学习(pFL)已成为解决联邦学习中客户端数据异构性问题的有前景方案。然而,现有pFL方法要么(1)引入高昂的通信与计算成本,要么(2)对局部数据过拟合——这些数据往往范围有限,且难以适应存在自然偏移的演化测试样本。本文提出PerAda,一种参数高效的pFL框架,该框架在降低通信与计算成本的同时,展现出优异的泛化性能,尤其在测试时分布偏移场景下表现突出。PerAda通过利用预训练模型的优势,仅更新并传输适配器中的少量附加参数,从而显著降低成本。该框架具有良好的泛化能力,因其通过全局适配器对每个客户端的个性化适配器进行正则化,而全局适配器则利用知识蒸馏技术聚合来自所有客户端的泛化信息。理论上,我们提供了泛化界以解释PerAda提升泛化的机理,并证明了其在非凸设定下能收敛至稳定点。实证研究表明,相较于基线方法,PerAda在不同自然领域与医疗领域数据集上均展现出具有竞争力的个性化性能(在CheXpert上提升4.85%),并实现了更优的分布外泛化能力(在CIFAR-10-C上提升5.23%),而每个模型基于适配器仅需更新12.6%的参数。我们的代码公开于https://github.com/NVlabs/PerAda。