Approximating invariant subspaces of generalized eigenvalue problems (GEPs) is a fundamental computational problem at the core of machine learning and scientific computing. It is, for example, the root of Principal Component Analysis (PCA) for dimensionality reduction, data visualization, and noise filtering, and of Density Functional Theory (DFT), arguably the most popular method to calculate the electronic structure of materials. Given Hermitian $H,S\in\mathbb{C}^{n\times n}$, where $S$ is positive-definite, let $\Pi_k$ be the true spectral projector on the invariant subspace that is associated with the $k$ smallest (or largest) eigenvalues of the GEP $HC=SC\Lambda$, for some $k\in[n]$. We show that we can compute a matrix $\widetilde\Pi_k$ such that $\lVert\Pi_k-\widetilde\Pi_k\rVert_2\leq \epsilon$, in $O\left( n^{\omega+\eta}\mathrm{polylog}(n,\epsilon^{-1},\kappa(S),\mathrm{gap}_k^{-1}) \right)$ bit operations in the floating point model, for some $\epsilon\in(0,1)$, with probability $1-1/n$. Here, $\eta>0$ is arbitrarily small, $\omega\lesssim 2.372$ is the matrix multiplication exponent, $\kappa(S)=\lVert S\rVert_2\lVert S^{-1}\rVert_2$, and $\mathrm{gap}_k$ is the gap between eigenvalues $k$ and $k+1$. To achieve such provable "forward-error" guarantees, our methods rely on a new $O(n^{\omega+\eta})$ stability analysis for the Cholesky factorization, and a smoothed analysis for computing spectral gaps, which can be of independent interest. Ultimately, we obtain new matrix multiplication-type bit complexity upper bounds for PCA problems, including classical PCA and (randomized) low-rank approximation.
翻译:近似计算广义特征值问题(GEP)的不变子空间是机器学习和科学计算核心的基础计算问题。例如,它是用于降维、数据可视化和噪声过滤的主成分分析(PCA)的根源,也是密度泛函理论(DFT)——计算材料电子结构最流行的方法——的根源。给定埃尔米特矩阵 $H,S\in\mathbb{C}^{n\times n}$,其中 $S$ 是正定矩阵,令 $\Pi_k$ 为 GEP $HC=SC\Lambda$ 中与第 $k$ 个最小(或最大)特征值相关联的不变子空间上的真实谱投影算子,其中 $k\in[n]$。我们证明,可以在 $O\left( n^{\omega+\eta}\mathrm{polylog}(n,\epsilon^{-1},\kappa(S),\mathrm{gap}_k^{-1}) \right)$ 次浮点模型比特操作内,以 $1-1/n$ 的概率计算出一个矩阵 $\widetilde\Pi_k$,使得 $\lVert\Pi_k-\widetilde\Pi_k\rVert_2\leq \epsilon$,其中 $\epsilon\in(0,1)$。这里,$\eta>0$ 是任意小的正数,$\omega\lesssim 2.372$ 是矩阵乘法指数,$\kappa(S)=\lVert S\rVert_2\lVert S^{-1}\rVert_2$,而 $\mathrm{gap}_k$ 是第 $k$ 个与第 $k+1$ 个特征值之间的间隙。为了实现这种可证明的“前向误差”保证,我们的方法依赖于对 Cholesky 分解的一种新的 $O(n^{\omega+\eta})$ 稳定性分析,以及对计算谱间隙的平滑分析,这些分析可能具有独立的研究价值。最终,我们为 PCA 问题(包括经典 PCA 和(随机化)低秩近似)获得了新的矩阵乘法型比特复杂度上界。