Variational inequalities in general and saddle point problems in particular are increasingly relevant in machine learning applications, including adversarial learning, GANs, transport and robust optimization. With increasing data and problem sizes necessary to train high performing models across various applications, we need to rely on parallel and distributed computing. However, in distributed training, communication among the compute nodes is a key bottleneck during training, and this problem is exacerbated for high dimensional and over-parameterized models. Due to these considerations, it is important to equip existing methods with strategies that would allow to reduce the volume of transmitted information during training while obtaining a model of comparable quality. In this paper, we present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2. Our theory and methods allow for the use of both unbiased (such as Rand$k$; MASHA1) and contractive (such as Top$k$; MASHA2) compressors. New algorithms support bidirectional compressions, and also can be modified for stochastic setting with batches and for federated learning with partial participation of clients. We empirically validated our conclusions using two experimental setups: a standard bilinear min-max problem, and large-scale distributed adversarial training of transformers.
翻译:变分不等式(尤其是鞍点问题)在机器学习应用中日益重要,涵盖对抗学习、生成对抗网络(GAN)、最优传输及鲁棒优化等领域。随着训练高性能模型所需的数据规模与问题复杂度不断增长,我们必须依赖并行与分布式计算。然而,在分布式训练中,计算节点间的通信是训练过程中的关键瓶颈,而对于高维和过参数化模型,这一问题更为突出。基于这些考量,亟需为现有方法配备能够减少训练期间信息传输量、同时保持模型质量相当的策略。本文首次提出基于压缩通信的、具有理论基础的分布式方法以求解变分不等式与鞍点问题:MASHA1与MASHA2。我们的理论与方法支持使用无偏压缩器(如Rand$k$;MASHA1)与收缩压缩器(如Top$k$;MASHA2)。新算法支持双向压缩,并可扩展至小批量随机设置与客户端部分参与的联邦学习场景。我们通过两个实验设置验证了结论:标准双线性极小-极大问题,以及大规模分布式对抗训练Transformer模型。