Cyclic block coordinate methods are a fundamental class of optimization methods widely used in practice and implemented as part of standard software packages for statistical learning. Nevertheless, their convergence is generally not well understood and so far their good practical performance has not been explained by existing convergence analyses. In this work, we introduce a new block coordinate method that applies to the general class of variational inequality (VI) problems with monotone operators. This class includes composite convex optimization problems and convex-concave min-max optimization problems as special cases and has not been addressed by the existing work. The resulting convergence bounds match the optimal convergence bounds of full gradient methods, but are provided in terms of a novel gradient Lipschitz condition w.r.t.~a Mahalanobis norm. For $m$ coordinate blocks, the resulting gradient Lipschitz constant in our bounds is never larger than a factor $\sqrt{m}$ compared to the traditional Euclidean Lipschitz constant, while it is possible for it to be much smaller. Further, for the case when the operator in the VI has finite-sum structure, we propose a variance reduced variant of our method which further decreases the per-iteration cost and has better convergence rates in certain regimes. To obtain these results, we use a gradient extrapolation strategy that allows us to view a cyclic collection of block coordinate-wise gradients as one implicit gradient.
翻译:循环分块坐标方法是一类基础的优化方法,广泛应用于实际中,并作为统计学习标准软件包的一部分实现。然而,其收敛性通常未得到充分理解,现有收敛分析至今未能解释其良好的实际性能。在本文中,我们提出一种新的分块坐标方法,适用于具有单调算子的变分不等式(VI)问题的一般类别。该类别包括复合凸优化问题和凸-凹极小-最大优化问题作为特例,且尚未被现有工作所涉及。所得到的收敛界匹配全梯度方法的最优收敛界,但以关于马氏范数的新型梯度Lipschitz条件形式给出。对于$m$个坐标块,我们结果中的梯度Lipschitz常数与传统欧几里得Lipschitz常数相比,其取值范围不会大于因子$\sqrt{m}$,且有可能远小于该值。此外,针对VI中算子具有有限和结构的情况,我们提出该方法的一种方差缩减变体,它进一步降低了每次迭代的计算成本,并在某些情况下具有更优的收敛速度。为获得这些结果,我们采用梯度外推策略,使得循环收集的分块坐标梯度可视为一个隐式梯度。