Fast stochastic dual coordinate descent algorithms for linearly constrained convex optimization

The problem of finding a solution to the linear system $Ax = b$ with certain minimization properties arises in numerous scientific and engineering areas. In the era of big data, the stochastic optimization algorithms become increasingly significant due to their scalability for problems of unprecedented size. This paper focuses on the problem of minimizing a strongly convex function subject to linear constraints. We consider the dual formulation of this problem and adopt the stochastic coordinate descent to solve it. The proposed algorithmic framework, called fast stochastic dual coordinate descent, utilizes sampling matrices sampled from user-defined distributions to extract gradient information. Moreover, it employs Polyak's heavy ball momentum acceleration with adaptive parameters learned through iterations, overcoming the limitation of the heavy ball momentum method that it requires prior knowledge of certain parameters, such as the singular values of a matrix. With these extensions, the framework is able to recover many well-known methods in the context, including the randomized sparse Kaczmarz method, the randomized regularized Kaczmarz method, the linearized Bregman iteration, and a variant of the conjugate gradient (CG) method. We prove that, with strongly admissible objective function, the proposed method converges linearly in expectation. Numerical experiments are provided to confirm our results.

翻译：求解线性系统$Ax = b$并具有特定最小化性质的问题出现在众多科学与工程领域。在大数据时代，随机优化算法因其对空前规模问题的可扩展性而日益重要。本文聚焦于在线性约束下最小化强凸函数的问题。我们考虑该问题的对偶形式，并采用随机坐标下降法求解。所提出的算法框架称为快速随机对偶坐标下降，它利用从用户定义分布中采样的采样矩阵来提取梯度信息。此外，该算法采用具有自适应参数的Polyak重球动量加速，这些参数通过迭代学习得到，克服了重球动量法需要预先知道某些参数（如矩阵奇异值）的局限性。通过这些扩展，该框架能够恢复该领域中的许多著名方法，包括随机稀疏Kaczmarz方法、随机正则化Kaczmarz方法、线性化Bregman迭代以及共轭梯度法的一种变体。我们证明，在目标函数强可允许的条件下，所提方法期望线性收敛。数值实验验证了我们的结果。

相关内容

坐标下降

关注 0

坐标下降法（coordinate descent）是一种非梯度优化算法。算法在每次迭代中，在当前点处沿一个坐标方向进行一维搜索以求得一个函数的局部极小值。在整个过程中循环使用不同的坐标方向。对于不可拆分的函数而言，算法可能无法在较小的迭代步数中求得最优解。为了加速收敛，可以采用一个适当的坐标系，例如通过主成分分析获得一个坐标间尽可能不相互关联的新坐标系.

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

分布外泛化(Out-Of-Distribution Generalization) 综述论文，22页pdf240篇文献

专知会员服务

64+阅读 · 2021年9月2日