We present a sublinear time algorithm for computing a near optimal low-rank approximation to any positive semidefinite (PSD) Toeplitz matrix $T\in \mathbb{R}^{d\times d}$, given noisy access to its entries. In particular, given entrywise query access to $T+E$ for an arbitrary noise matrix $E\in \mathbb{R}^{d\times d}$, integer rank $k\leq d$, and error parameter $\delta>0$, our algorithm runs in time $\text{poly}(k,\log(d/\delta))$ and outputs (in factored form) a Toeplitz matrix $\widetilde{T} \in \mathbb{R}^{d \times d}$ with rank $\text{poly}(k,\log(d/\delta))$ satisfying, for some fixed constant $C$, \begin{equation*} \|T-\widetilde{T}\|_F \leq C \cdot \max\{\|E\|_F,\|T-T_k\|_F\} + \delta \cdot \|T\|_F. \end{equation*} Here $\|\cdot \|_F$ is the Frobenius norm and $T_k$ is the best (not necessarily Toeplitz) rank-$k$ approximation to $T$ in the Frobenius norm, given by projecting $T$ onto its top $k$ eigenvectors. Our result has the following applications. When $E = 0$, we obtain the first sublinear time near-relative-error low-rank approximation algorithm for PSD Toeplitz matrices, resolving the main open problem of Kapralov et al. SODA `23, whose algorithm had sublinear query complexity but exponential runtime. Our algorithm can also be applied to approximate the unknown Toeplitz covariance matrix of a multivariate Gaussian distribution, given sample access to this distribution, resolving an open question of Eldar et al. SODA `20. Our algorithm applies sparse Fourier transform techniques to recover a low-rank Toeplitz matrix using its Fourier structure. Our key technical contribution is the first polynomial time algorithm for \emph{discrete time off-grid} sparse Fourier recovery, which may be of independent interest.
翻译:我们提出一种次线性时间算法,用于在仅能带噪访问正半定(PSD)Toeplitz矩阵$T\in \mathbb{R}^{d\times d}$元素的情况下,计算其近最优低秩逼近。具体而言,给定对任意噪声矩阵$E\in \mathbb{R}^{d\times d}$的逐元素查询访问$T+E$、整数秩$k\leq d$以及误差参数$\delta>0$,本算法运行时间为$\text{poly}(k,\log(d/\delta))$,并以因子分解形式输出一个秩为$\text{poly}(k,\log(d/\delta))$的Toeplitz矩阵$\widetilde{T} \in \mathbb{R}^{d \times d}$,满足(对于某固定常数$C$):
\begin{equation*} \|T-\widetilde{T}\|_F \leq C \cdot \max\{\|E\|_F,\|T-T_k\|_F\} + \delta \cdot \|T\|_F. \end{equation*}
其中$\|\cdot \|_F$为Frobenius范数,$T_k$为$T$在Frobenius范数下的最佳(不必为Toeplitz)秩-$k$逼近,即通过将$T$投影到其前$k$个特征向量上得到。本结果具有以下应用:当$E = 0$时,我们首次获得PSD Toeplitz矩阵的次线性时间近相对误差低秩逼近算法,解决了Kapralov等人(SODA `23)的主要开放问题——其算法虽具有次线性查询复杂度但运行时间为指数级。此外,本算法可应用于在给定多元高斯分布样本的情况下逼近其未知Toeplitz协方差矩阵,从而解决了Eldar等人(SODA `20)的开放问题。本算法利用稀疏傅里叶变换技术,通过矩阵的傅里叶结构恢复低秩Toeplitz矩阵。我们的关键技术贡献是首个用于\emph{离散时间离网格}稀疏傅里叶恢复的多项式时间算法,该结果可能具有独立意义。