Federated Ensemble Learning with Progressive Model Personalization

Federated Learning provides a privacy-preserving paradigm for distributed learning, but suffers from statistical heterogeneity across clients. Personalized Federated Learning (PFL) mitigates this issue by considering client-specific models. A widely adopted approach in PFL decomposes neural networks into a shared feature extractor and client-specific heads. While effective, this design induces a fundamental tradeoff: deep or expressive shared components hinder personalization, whereas large local heads exacerbate overfitting under limited per-client data. Most existing methods rely on rigid, shallow heads, and therefore fail to navigate this tradeoff in a principled manner. In this work, we propose a boosting-inspired framework that enables a smooth control of this tradeoff. Instead of training a single personalized model, we construct an ensemble of $T$ models for each client. Across boosting iterations, the depth of the personalized component are progressively increased, while its effective complexity is systematically controlled via low-rank factorization or width shrinkage. This design simultaneously limits overfitting and substantially reduces per-client bias by allowing increasingly expressive personalization. We provide theoretical analysis that establishes generalization bounds with favorable dependence on the average local sample size and the total number of clients. Specifically, we prove that the complexity of the shared layers is effectively suppressed, while the dependence on the boosting horizon $T$ is controlled through parameter reduction. Notably, we provide a novel nonlinear generalization guarantee for decoupled PFL models. Extensive experiments on benchmark and real-world datasets (e.g., EMNIST, CIFAR-10/100, and Sent140) demonstrate that the proposed framework consistently outperforms state-of-the-art PFL methods under heterogeneous data distributions.

翻译：联邦学习为分布式学习提供了一种隐私保护的范式，但面临客户端间统计异质性的挑战。个性化联邦学习通过考虑客户端特定模型来缓解此问题。PFL中一种广泛采用的方法是将神经网络分解为共享特征提取器和客户端特定头部。虽然有效，但这种设计引发了一个根本性的权衡：深层或表达能力强的共享组件会阻碍个性化，而大型本地头部在有限的每客户端数据下会加剧过拟合。大多数现有方法依赖于固定、浅层的头部，因此无法以原则性方式驾驭这一权衡。本文提出一种受提升方法启发的框架，能够平滑地控制此权衡。我们并非为每个客户端训练单一个性化模型，而是构建一个包含$T$个模型的集成。在提升迭代过程中，个性化组件的深度逐步增加，同时通过低秩分解或宽度收缩系统性地控制其有效复杂度。该设计通过允许表达能力逐步增强的个性化，在限制过拟合的同时显著减少每客户端偏差。我们提供了理论分析，建立了泛化界，其有利地依赖于平均本地样本量和客户端总数。具体而言，我们证明了共享层的复杂度被有效抑制，而对提升迭代次数$T$的依赖通过参数缩减得到控制。值得注意的是，我们为解耦的PFL模型提供了一个新颖的非线性泛化保证。在基准和真实世界数据集（如EMNIST、CIFAR-10/100和Sent140）上的大量实验表明，所提框架在异构数据分布下始终优于最先进的PFL方法。