Fast stochastic dual coordinate descent algorithms for linearly constrained convex optimization

Finding a solution to the linear system $Ax = b$ with various minimization properties arises from many engineering and computer science applications, including compressed sensing, image processing, and machine learning. In the age of big data, the scalability of stochastic optimization algorithms has made it increasingly important to solve problems of unprecedented sizes. This paper focuses on the problem of minimizing a strongly convex objective function subject to linearly constraints. We consider the dual formulation of this problem and adopt the stochastic coordinate descent to solve it. The proposed algorithmic framework, called fast stochastic dual coordinate descent, utilizes an adaptive variation of Polyak's heavy ball momentum and user-defined distributions for sampling. Our adaptive heavy ball momentum technique can efficiently update the parameters by using iterative information, overcoming the limitation of the heavy ball momentum method where prior knowledge of certain parameters, such as singular values of a matrix, is required. We prove that, under strongly admissible of the objective function, the propose method converges linearly in expectation. By varying the sampling matrix, we recover a comprehensive array of well-known algorithms as special cases, including the randomized sparse Kaczmarz method, the randomized regularized Kaczmarz method, the linearized Bregman iteration, and a variant of the conjugate gradient (CG) method. Numerical experiments are provided to confirm our results.

翻译：求解线性系统$Ax = b$并具有各种最小化性质的问题出现在许多工程和计算机科学应用中，包括压缩感知、图像处理和机器学习。在大数据时代，随机优化算法的可扩展性使其在解决前所未有规模的问题上愈发重要。本文聚焦于在满足线性约束下最小化强凸目标函数的问题。我们考虑该问题的对偶形式，并采用随机坐标下降法进行求解。所提出的算法框架称为快速随机对偶坐标下降，该框架利用了Polyak重球动量的自适应变体以及用户定义的采样分布。我们的自适应重球动量技术能够通过迭代信息高效更新参数，克服了重球动量方法需要预先知道某些参数（如矩阵奇异值）的局限性。我们证明，在目标函数强可接受性的条件下，所提方法在期望意义上线性收敛。通过改变采样矩阵，我们恢复了一系列已知算法作为特例，包括随机稀疏Kaczmarz方法、随机正则化Kaczmarz方法、线性化Bregman迭代以及共轭梯度（CG）方法的一种变体。数值实验验证了我们的结果。

相关内容

坐标下降

关注 0

坐标下降法（coordinate descent）是一种非梯度优化算法。算法在每次迭代中，在当前点处沿一个坐标方向进行一维搜索以求得一个函数的局部极小值。在整个过程中循环使用不同的坐标方向。对于不可拆分的函数而言，算法可能无法在较小的迭代步数中求得最优解。为了加速收敛，可以采用一个适当的坐标系，例如通过主成分分析获得一个坐标间尽可能不相互关联的新坐标系.

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日