We give a stochastic optimization algorithm that solves a dense $n\times n$ real-valued linear system $Ax=b$, returning $\tilde x$ such that $\|A\tilde x-b\|\leq \epsilon\|b\|$ in time: $$\tilde O((n^2+nk^{\omega-1})\log1/\epsilon),$$ where $k$ is the number of singular values of $A$ larger than $O(1)$ times its smallest positive singular value, $\omega < 2.372$ is the matrix multiplication exponent, and $\tilde O$ hides a poly-logarithmic in $n$ factor. When $k=O(n^{1-\theta})$ (namely, $A$ has a flat-tailed spectrum, e.g., due to noisy data or regularization), this improves on both the cost of solving the system directly, as well as on the cost of preconditioning an iterative method such as conjugate gradient. In particular, our algorithm has an $\tilde O(n^2)$ runtime when $k=O(n^{0.729})$. We further adapt this result to sparse positive semidefinite matrices and least squares regression. Our main algorithm can be viewed as a randomized block coordinate descent method, where the key challenge is simultaneously ensuring good convergence and fast per-iteration time. In our analysis, we use theory of majorization for elementary symmetric polynomials to establish a sharp convergence guarantee when coordinate blocks are sampled using a determinantal point process. We then use a Markov chain coupling argument to show that similar convergence can be attained with a cheaper sampling scheme, and accelerate the block coordinate descent update via matrix sketching.
翻译:我们提出一种随机优化算法,用于求解稠密的$n\times n$实值线性系统$Ax=b$,在时间$$\tilde O((n^2+nk^{\omega-1})\log1/\epsilon)$$内返回满足$\|A\tilde x-b\|\leq \epsilon\|b\|$的$\tilde x$。其中$k$是$A$中大于其最小正奇异值$O(1)$倍的奇异值个数,$\omega < 2.372$为矩阵乘法指数,$\tilde O$隐藏了关于$n$的多对数因子。当$k=O(n^{1-\theta})$时(即$A$具有平坦尾谱,例如由噪声数据或正则化导致),该算法在直接求解系统的成本及预处理共轭梯度等迭代方法的成本上均实现了改进。特别地,当$k=O(n^{0.729})$时,算法的运行时间为$\tilde O(n^2)$。我们进一步将这一结果推广到稀疏半正定矩阵和最小二乘回归问题。本文核心算法可视为随机块坐标下降法,其关键挑战在于同时保证良好的收敛性和快速的单次迭代时间。在分析中,我们利用初等对称多项式的优超理论,建立了当使用行列式点过程采样坐标块时的精确收敛保证。随后通过马尔可夫链耦合论证表明,采用更廉价的采样方案也能实现类似的收敛性,并通过矩阵草图加速块坐标下降更新。