We study optimization methods to train local (or personalized) models for decentralized collections of local datasets with an intrinsic network structure. This network structure arises from domain-specific notions of similarity between local datasets. Examples for such notions include spatio-temporal proximity, statistical dependencies or functional relations. Our main conceptual contribution is to formulate federated learning as generalized total variation (GTV) minimization. This formulation unifies and considerably extends existing federated learning methods. It is highly flexible and can be combined with a broad range of parametric models, including generalized linear models or deep neural networks. Our main algorithmic contribution is a fully decentralized federated learning algorithm. This algorithm is obtained by applying an established primal-dual method to solve GTV minimization. It can be implemented as message passing and is robust against inexact computations that arise from limited computational resources including processing time or bandwidth. Our main analytic contribution is an upper bound on the deviation between the local model parameters learnt by our algorithm and an oracle-based clustered federated learning method. This upper bound reveals conditions on the local models and the network structure of local datasets such that GTV minimization is able to pool (nearly) homogeneous local datasets.
翻译:我们研究用于训练具有内在网络结构的分散式局部数据集集合的局部(或个性化)模型的优化方法。这种网络结构源于局部数据集之间基于特定领域概念的相似性,例如时空邻近性、统计依赖性或函数关系。我们的主要概念贡献在于将联邦学习形式化为广义全变差(GTV)最小化问题。这一统一框架极大地扩展了现有联邦学习方法,具有高度灵活性,可与各类参数模型(包括广义线性模型或深度神经网络)相结合。我们的主要算法贡献在于提出一种完全去中心化的联邦学习算法。该算法通过应用已有的原始-对偶方法求解GTV最小化问题,可基于消息传递机制实现,并且对计算资源(包括处理时间或带宽)有限所导致的非精确计算具有鲁棒性。我们的主要理论分析贡献在于建立了该算法所学习的局部模型参数与基于神谕的聚类联邦学习方法之间偏差的上界。该上界揭示了局部模型与局部数据集网络结构满足的条件,在此条件下,GTV最小化能够聚合(近乎)同质的局部数据集。