Sublinear Time Low-Rank Approximation of Toeplitz Matrices

We present a sublinear time algorithm for computing a near optimal low-rank approximation to any positive semidefinite (PSD) Toeplitz matrix $T\in \mathbb{R}^{d\times d}$, given noisy access to its entries. In particular, given entrywise query access to $T+E$ for an arbitrary noise matrix $E\in \mathbb{R}^{d\times d}$, integer rank $k\leq d$, and error parameter $\delta>0$, our algorithm runs in time $\text{poly}(k,\log(d/\delta))$ and outputs (in factored form) a Toeplitz matrix $\widetilde{T} \in \mathbb{R}^{d \times d}$ with rank $\text{poly}(k,\log(d/\delta))$ satisfying, for some fixed constant $C$, \begin{equation*} \|T-\widetilde{T}\|_F \leq C \cdot \max\{\|E\|_F,\|T-T_k\|_F\} + \delta \cdot \|T\|_F. \end{equation*} Here $\|\cdot \|_F$ is the Frobenius norm and $T_k$ is the best (not necessarily Toeplitz) rank-$k$ approximation to $T$ in the Frobenius norm, given by projecting $T$ onto its top $k$ eigenvectors. Our result has the following applications. When $E = 0$, we obtain the first sublinear time near-relative-error low-rank approximation algorithm for PSD Toeplitz matrices, resolving the main open problem of Kapralov et al. SODA `23, whose algorithm had sublinear query complexity but exponential runtime. Our algorithm can also be applied to approximate the unknown Toeplitz covariance matrix of a multivariate Gaussian distribution, given sample access to this distribution, resolving an open question of Eldar et al. SODA `20. Our algorithm applies sparse Fourier transform techniques to recover a low-rank Toeplitz matrix using its Fourier structure. Our key technical contribution is the first polynomial time algorithm for \emph{discrete time off-grid} sparse Fourier recovery, which may be of independent interest.

翻译：我们提出一种次线性时间算法，用于在仅能带噪访问正半定（PSD）Toeplitz矩阵$T\in \mathbb{R}^{d\times d}$元素的情况下，计算其近最优低秩逼近。具体而言，给定对任意噪声矩阵$E\in \mathbb{R}^{d\times d}$的逐元素查询访问$T+E$、整数秩$k\leq d$以及误差参数$\delta>0$，本算法运行时间为$\text{poly}(k,\log(d/\delta))$，并以因子分解形式输出一个秩为$\text{poly}(k,\log(d/\delta))$的Toeplitz矩阵$\widetilde{T} \in \mathbb{R}^{d \times d}$，满足（对于某固定常数$C$）： \begin{equation*} \|T-\widetilde{T}\|_F \leq C \cdot \max\{\|E\|_F,\|T-T_k\|_F\} + \delta \cdot \|T\|_F. \end{equation*} 其中$\|\cdot \|_F$为Frobenius范数，$T_k$为$T$在Frobenius范数下的最佳（不必为Toeplitz）秩-$k$逼近，即通过将$T$投影到其前$k$个特征向量上得到。本结果具有以下应用：当$E = 0$时，我们首次获得PSD Toeplitz矩阵的次线性时间近相对误差低秩逼近算法，解决了Kapralov等人（SODA `23）的主要开放问题——其算法虽具有次线性查询复杂度但运行时间为指数级。此外，本算法可应用于在给定多元高斯分布样本的情况下逼近其未知Toeplitz协方差矩阵，从而解决了Eldar等人（SODA `20）的开放问题。本算法利用稀疏傅里叶变换技术，通过矩阵的傅里叶结构恢复低秩Toeplitz矩阵。我们的关键技术贡献是首个用于\emph{离散时间离网格}稀疏傅里叶恢复的多项式时间算法，该结果可能具有独立意义。