Switch and Conquer: Efficient Algorithms By Switching Stochastic Gradient Oracles For Decentralized Saddle Point Problems

We consider a class of non-smooth strongly convex-strongly concave saddle point problems in a decentralized setting without a central server. To solve a consensus formulation of problems in this class, we develop an inexact primal dual hybrid gradient (inexact PDHG) procedure that allows generic gradient computation oracles to update the primal and dual variables. We first investigate the performance of inexact PDHG with stochastic variance reduction gradient (SVRG) oracle. Our numerical study uncovers a significant phenomenon of initial conservative progress of iterates of IPDHG with SVRG oracle. To tackle this, we develop a simple and effective switching idea, where a generalized stochastic gradient (GSG) computation oracle is employed to hasten the iterates' progress to a saddle point solution during the initial phase of updates, followed by a switch to the SVRG oracle at an appropriate juncture. The proposed algorithm is named Decentralized Proximal Switching Stochastic Gradient method with Compression (C-DPSSG), and is proven to converge to an $\epsilon$-accurate saddle point solution with linear rate. Apart from delivering highly accurate solutions, our study reveals that utilizing the best convergence phases of GSG and SVRG oracles makes C-DPSSG well suited for obtaining solutions of low/medium accuracy faster, useful for certain applications. Numerical experiments on two benchmark machine learning applications show C-DPSSG's competitive performance which validate our theoretical findings. The codes used in the experiments can be found \href{https://github.com/chhavisharma123/C-DPSSG-CDC2023}{here}.

翻译：我们考虑一类无中心服务器的去中心化设置下的非光滑强凸-强凹鞍点问题。为解决该类问题的共识形式，我们开发了一种允许使用通用梯度计算预言机更新原始变量和对偶变量的非精确原始-对偶混合梯度（inexact PDHG）方法。首先研究了带有随机方差缩减梯度（SVRG）预言机的非精确PDHG的性能。数值实验揭示了IPDHG结合SVRG预言机时迭代初期存在显著的保守进展现象。为解决此问题，我们提出了一种简单有效的切换思想：在更新初始阶段采用广义随机梯度（GSG）计算预言机加速迭代向鞍点解的推进，随后在适当时机切换至SVRG预言机。所提算法命名为带压缩的去中心化近端切换随机梯度方法（C-DPSSG），理论证明其能以线性速率收敛到ε-精确鞍点解。除提供高精度解外，我们的研究表明，利用GSG和SVRG预言机的最佳收敛阶段，使C-DPSSG特别适用于快速获取低/中等精度解，这对某些应用场景具有重要价值。在两项基准机器学习应用上的数值实验展示了C-DPSSG的竞争性能，验证了我们的理论发现。实验所用代码可在\href{https://github.com/chhavisharma123/C-DPSSG-CDC2023}{此处}获取。