Federated learning (FL) is an emerging machine learning paradigm that allows multiple parties to train a shared model collaboratively in a privacy-preserving manner. Existing horizontal FL methods generally assume that the FL server and clients hold the same model structure. However, due to system heterogeneity and the need for personalization, enabling clients to hold models with diverse structures has become an important direction. Existing model-heterogeneous FL approaches often require publicly available datasets and incur high communication and/or computational costs, which limit their performances. To address these limitations, we propose a simple but effective Federated Global prediction Header (FedGH) approach. It is a communication and computation-efficient model-heterogeneous FL framework which trains a shared generalized global prediction header with representations extracted by heterogeneous extractors for clients' models at the FL server. The trained generalized global prediction header learns from different clients. The acquired global knowledge is then transferred to clients to substitute each client's local prediction header. We derive the non-convex convergence rate of FedGH. Extensive experiments on two real-world datasets demonstrate that FedGH achieves significantly more advantageous performance in both model-homogeneous and -heterogeneous FL scenarios compared to seven state-of-the-art personalized FL models, beating the best-performing baseline by up to 8.87% (for model-homogeneous FL) and 1.83% (for model-heterogeneous FL) in terms of average test accuracy, while saving up to 85.53% of communication overhead.
翻译:联邦学习是一种新兴的机器学习范式,允许多方以隐私保护方式协作训练共享模型。现有水平联邦学习方法通常假设服务器与客户端持有相同模型结构。然而,由于系统异构性与个性化需求,允许客户端持有不同结构的模型已成为重要方向。现有模型异构联邦学习方法常需公开数据集,且存在通信/计算成本高昂的问题,限制了其性能。为解决上述局限,我们提出一种简洁高效的联邦全局预测头部方法FedGH。该通信与计算高效的模型异构联邦学习框架,在服务器端利用异构特征提取器提取客户端模型表征,训练共享的广义全局预测头部。该训练完成的广义全局预测头部从不同客户端学习知识,并传输至客户端以替代其本地预测头部。我们推导了FedGH的非凸收敛速率。在两个真实数据集上的大量实验表明:在模型同构与异构联邦场景中,FedGH相比七种最先进个性化联邦模型均取得显著优势,平均测试准确率最高分别提升8.87%(模型同构场景)与1.83%(模型异构场景),同时节省最高85.53%通信开销。