We give a stochastic optimization algorithm that solves a dense $n\times n$ real-valued linear system $Ax=b$, returning $\tilde x$ such that $\|A\tilde x-b\|\leq \epsilon\|b\|$ in time: $$\tilde O((n^2+nk^{\omega-1})\log1/\epsilon),$$ where $k$ is the number of singular values of $A$ larger than $O(1)$ times its smallest positive singular value, $\omega < 2.372$ is the matrix multiplication exponent, and $\tilde O$ hides a poly-logarithmic in $n$ factor. When $k=O(n^{1-\theta})$ (namely, $A$ has a flat-tailed spectrum, e.g., due to noisy data or regularization), this improves on both the cost of solving the system directly, as well as on the cost of preconditioning an iterative method such as conjugate gradient. In particular, our algorithm has an $\tilde O(n^2)$ runtime when $k=O(n^{0.729})$. We further adapt this result to sparse positive semidefinite matrices and least squares regression. Our main algorithm can be viewed as a randomized block coordinate descent method, where the key challenge is simultaneously ensuring good convergence and fast per-iteration time. In our analysis, we use theory of majorization for elementary symmetric polynomials to establish a sharp convergence guarantee when coordinate blocks are sampled using a determinantal point process. We then use a Markov chain coupling argument to show that similar convergence can be attained with a cheaper sampling scheme, and accelerate the block coordinate descent update via matrix sketching.
翻译:我们提出一种随机优化算法,用于求解稠密$n\times n$实值线性系统$Ax=b$,该算法返回满足$\|A\tilde x-b\|\leq \epsilon\|b\|$的近似解$\tilde x$,其时间复杂度为:$$\tilde O((n^2+nk^{\omega-1})\log1/\epsilon),$$其中$k$是$A$大于其最小正奇异值$O(1)$倍的奇异值个数,$\omega < 2.372$为矩阵乘法指数,$\tilde O$表示忽略$n$的多对数因子。当$k=O(n^{1-\theta})$(即$A$具有平尾谱,例如由噪声数据或正则化导致)时,本算法在直接求解系统的计算代价以及共轭梯度法等迭代方法的预条件代价两方面均实现改进。特别地,当$k=O(n^{0.729})$时,算法运行时达到$\tilde O(n^2)$。我们进一步将该结果推广至稀疏半正定矩阵和最小二乘回归问题。我们的核心算法可视为随机块坐标下降法,其中关键挑战在于同时确保良好的收敛性与快速的每轮迭代时间。在理论分析中,我们利用初等对称多项式的主化理论,建立了采用行列式点过程采样坐标块时的严格收敛保证。随后通过马尔可夫链耦合论证表明,使用更低成本的采样方案也能达到类似收敛效果,并借助矩阵草图技术加速块坐标下降迭代。