Most federated learning (FL) methods use a client-server scheme, where clients communicate only with a central server. However, this scheme is prone to bandwidth bottlenecks at the server and has a single point of failure. In contrast, in a (fully) decentralized approach, clients communicate directly with each other, dispensing with the server and mitigating these issues. Yet, as the client network grows larger and sparser, the convergence of decentralized methods slows down, even failing to converge if the network is disconnected. This work addresses this gap between client-server and decentralized schemes, focusing on the vertical FL setup, where clients hold different features of the same samples. We propose multi-token coordinate descent (MTCD), a flexible semi-decentralized method for vertical FL that can exploit both client-server and client-client links. By selecting appropriate hyperparameters, MTCD recovers the client-sever and decentralized schemes as special cases. In fact, its decentralized instance is itself a novel method of independent interest. Yet, by controlling the degree of dependency on client-server links, MTCD can also explore a spectrum of schemes ranging from client-server to decentralized. We prove that, for sufficiently large batch sizes, MTCD converges at an $\mathcal{O}(1/T)$ rate for nonconvex objectives when the tokens roam across disjoint subsets of clients. To capture the aforementioned drawbacks of the client-server scheme succinctly, we model the relative impact of using client-server versus client-client links as the ratio of their "costs", which depends on the application. This allows us to demonstrate, both analytically and empirically, that by tuning the degree of dependency on the server, the semi-decentralized instances of MTCD can outperform both client-server and decentralized approaches across a range of applications.
翻译:大多数联邦学习(FL)方法采用客户端-服务器架构,其中客户端仅与中央服务器通信。然而,该架构易受服务器带宽瓶颈影响,且存在单点故障风险。相比之下,在(完全)去中心化方法中,客户端直接相互通信,无需服务器,从而缓解了这些问题。然而,随着客户端网络规模增大且拓扑稀疏,去中心化方法的收敛速度会减慢,甚至在网络不连通时无法收敛。本研究旨在填补客户端-服务器架构与去中心化架构之间的空白,重点关注垂直联邦学习场景——即客户端持有相同样本的不同特征。我们提出了多令牌坐标下降(MTCD),一种适用于垂直联邦学习的灵活半去中心化方法,能够同时利用客户端-服务器链路与客户端-客户端链路。通过选择适当的超参数,MTCD可将客户端-服务器架构和去中心化架构作为特例进行还原。事实上,其去中心化实例本身即是一种具有独立价值的新方法。更重要的是,通过控制对客户端-服务器链路的依赖程度,MTCD能够探索从客户端-服务器到去中心化之间的连续架构谱系。我们证明,在令牌遍历客户端不相交子集的条件下,当批次规模足够大时,MTCD对于非凸目标函数能以$\mathcal{O}(1/T)$速率收敛。为简洁刻画客户端-服务器架构的上述缺陷,我们将使用客户端-服务器链路相对于客户端-客户端链路的相对影响建模为二者"成本"之比,该比值取决于具体应用场景。这使得我们能够通过理论分析与实验验证表明:通过调节对服务器的依赖程度,MTCD的半去中心化实例在一系列应用场景中均能超越客户端-服务器与完全去中心化方法。