We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local stochastic dual vectors. This setting includes a broad range of important problems from distributed convex minimization to min-max and games. Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient. To this end, we propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs. We provide an adaptive step-size rule, which adapts to the respective noise profiles at hand and achieve a fast rate of ${\mathcal O}(1/T)$ under relative noise, and an order-optimal ${\mathcal O}(1/\sqrt{T})$ under absolute noise and show distributed training accelerates convergence. Finally, we validate our theoretical results by providing real-world experiments and training generative adversarial networks on multiple GPUs.
翻译:本文研究多GPU环境下的单调变分不等式(VI)问题,其中多个处理器/工作节点/客户端可访问局部随机对偶向量。该设置涵盖从分布式凸最小化到极小极大及博弈问题的一系列重要应用。作为求解单调VI问题的事实标准算法,额外梯度算法在设计上并未考虑通信效率。为此,我们提出量化广义额外梯度(Q-GenX)——一种专为求解VI问题设计的无偏自适应压缩方法。我们给出自适应步长规则,该规则能适配不同噪声分布特征,在相对噪声下达到${\mathcal O}(1/T)$的快速收敛率,在绝对噪声下达到阶最优的${\mathcal O}(1/\sqrt{T})$收敛率,并证明分布式训练可加速收敛。最后,通过实际实验和基于多GPU的生成对抗网络训练验证了理论结果。