Understanding Server-Assisted Federated Learning in the Presence of Incomplete Client Participation

Existing works in federated learning (FL) often assume an ideal system with either full client or uniformly distributed client participation. However, in practice, it has been observed that some clients may never participate in FL training (aka incomplete client participation) due to a myriad of system heterogeneity factors. A popular approach to mitigate impacts of incomplete client participation is the server-assisted federated learning (SA-FL) framework, where the server is equipped with an auxiliary dataset. However, despite SA-FL has been empirically shown to be effective in addressing the incomplete client participation problem, there remains a lack of theoretical understanding for SA-FL. Meanwhile, the ramifications of incomplete client participation in conventional FL are also poorly understood. These theoretical gaps motivate us to rigorously investigate SA-FL. Toward this end, we first show that conventional FL is {\em not} PAC-learnable under incomplete client participation in the worst case. Then, we show that the PAC-learnability of FL with incomplete client participation can indeed be revived by SA-FL, which theoretically justifies the use of SA-FL for the first time. Lastly, to provide practical guidance for SA-FL training under {\em incomplete client participation}, we propose the $\mathsf{SAFARI}$ (server-assisted federated averaging) algorithm that enjoys the same linear convergence speedup guarantees as classic FL with ideal client participation assumptions, offering the first SA-FL algorithm with convergence guarantee. Extensive experiments on different datasets show $\mathsf{SAFARI}$ significantly improves the performance under incomplete client participation.

翻译：现有联邦学习（FL）研究通常假设理想系统具有完全客户端参与或均匀分布的客户端参与。然而在实践中，观察到由于众多系统异构性因素，部分客户端可能从未参与FL训练（即不完全客户端参与）。缓解不完全客户端参与影响的常用方法是服务器辅助联邦学习（SA-FL）框架，其中服务器配备辅助数据集。尽管SA-FL在实证研究中被证明能有效解决不完全客户端参与问题，但其理论理解仍然缺乏。同时，传统FL中不完全客户端参与的影响也尚未得到充分理解。这些理论空白促使我们对SA-FL进行严格研究。为此，我们首先证明在最坏情况下，传统FL在不完全客户端参与条件下不具备PAC可学习性。接着，我们证明通过SA-FL确实可以恢复不完全客户端参与下FL的PAC可学习性，这首次为SA-FL的应用提供了理论依据。最后，为给不完全客户端参与下的SA-FL训练提供实践指导，我们提出$\mathsf{SAFARI}$（服务器辅助联邦平均）算法，该算法享有与理想客户端参与假设下经典FL相同的线性收敛加速保证，成为首个具有收敛保证的SA-FL算法。在不同数据集上的大量实验表明，$\mathsf{SAFARI}$能显著提升不完全客户端参与下的性能表现。