Communication efficiency is a major challenge in federated learning (FL). In client-server schemes, the server constitutes a bottleneck, and while decentralized setups spread communications, they do not necessarily reduce them due to slower convergence. We propose Multi-Token Coordinate Descent (MTCD), a communication-efficient algorithm for semi-decentralized vertical federated learning, exploiting both client-server and client-client communications when each client holds a small subset of features. Our multi-token method can be seen as a parallel Markov chain (block) coordinate descent algorithm and it subsumes the client-server and decentralized setups as special cases. We obtain a convergence rate of $\mathcal{O}(1/T)$ for nonconvex objectives when tokens roam over disjoint subsets of clients and for convex objectives when they roam over possibly overlapping subsets. Numerical results show that MTCD improves the state-of-the-art communication efficiency and allows for a tunable amount of parallel communications.
翻译:通信效率是联邦学习(FL)中的主要挑战。在客户端-服务器方案中,服务器构成瓶颈,而去中心化设置虽能分散通信,但由于收敛速度较慢,未必能减少总通信量。我们提出多令牌坐标下降法(MTCD),这是一种面向半去中心化纵向联邦学习的通信高效算法,当每个客户端持有少量特征子集时,它同时利用客户端-服务器和客户端-客户端通信。我们的多令牌方法可视为并行马尔可夫链(块)坐标下降算法,并将客户端-服务器和去中心化设置作为特例。对于非凸目标函数,当令牌在不重叠的客户端子集上漫游时,我们获得$\mathcal{O}(1/T)$的收敛率;对于凸目标函数,当令牌在可能重叠的子集上漫游时亦成立。数值结果表明,MTCD提升了现有最优的通信效率,并允许可调的并行通信量。