We consider distributed stochastic variational inequalities (VIs) on unbounded domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that, in particular, covers the settings of fully decentralized calculations with time-varying networks and centralized topologies commonly used in Federated Learning. Moreover, multiple local updates on the workers can be made for reducing the communication frequency between the workers. We extend the stochastic extragradient method to this very general setting and theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone (when a Minty solution exists) settings. The provided rates explicitly exhibit the dependence on network characteristics (e.g., mixing time), iteration counter, data heterogeneity, variance, number of devices, and other standard parameters. As a special case, our method and analysis apply to distributed stochastic saddle-point problems (SPP), e.g., to the training of Deep Generative Adversarial Networks (GANs) for which decentralized training has been reported to be extremely challenging. In experiments for the decentralized training of GANs we demonstrate the effectiveness of our proposed approach.
翻译:我们考虑无界域上的分布式随机变分不等式(VI),其问题数据是异质的(非独立同分布)并分布在众多设备上。我们对计算网络做出非常一般的假设,特别涵盖了随时间变化网络的完全去中心化计算以及联邦学习中常用的中心化拓扑结构。此外,工作节点可执行多次局部更新以降低节点间的通信频率。我们将随机额外梯度方法推广到此非常一般的设置中,并在强单调、单调及非单调(当Minty解存在时)场景下理论分析其收敛速率。所给出的速率显式地展现出对网络特性(如混合时间)、迭代次数、数据异质性、方差、设备数量及其他标准参数的依赖性。作为特例,我们的方法及分析适用于分布式随机鞍点问题(SPP),例如对深度生成对抗网络(GAN)的训练,而去中心化训练此类网络已被报道极其困难。在GAN的去中心化训练实验中,我们证明了所提出方法的有效性。