Dictionary learning, the problem of recovering a sparsely used matrix $\mathbf{D} \in \mathbb{R}^{M \times K}$ and $N$ $s$-sparse vectors $\mathbf{x}_i \in \mathbb{R}^{K}$ from samples of the form $\mathbf{y}_i = \mathbf{D}\mathbf{x}_i$, is of increasing importance to applications in signal processing and data science. When the dictionary is known, recovery of $\mathbf{x}_i$ is possible even for sparsity linear in dimension $M$, yet to date, the only algorithms which provably succeed in the linear sparsity regime are Riemannian trust-region methods, which are limited to orthogonal dictionaries, and methods based on the sum-of-squares hierarchy, which requires super-polynomial time in order to obtain an error which decays in $M$. In this work, we introduce SPORADIC (SPectral ORAcle DICtionary Learning), an efficient spectral method on family of reweighted covariance matrices. We prove that in high enough dimensions, SPORADIC can recover overcomplete ($K > M$) dictionaries satisfying the well-known restricted isometry property (RIP) even when sparsity is linear in dimension up to logarithmic factors. Moreover, these accuracy guarantees have an ``oracle property" that the support and signs of the unknown sparse vectors $\mathbf{x}_i$ can be recovered exactly with high probability, allowing for arbitrarily close estimation of $\mathbf{D}$ with enough samples in polynomial time. To the author's knowledge, SPORADIC is the first polynomial-time algorithm which provably enjoys such convergence guarantees for overcomplete RIP matrices in the near-linear sparsity regime.
翻译:字典学习,即从形如 $\mathbf{y}_i = \mathbf{D}\mathbf{x}_i$ 的样本中恢复稀疏使用的矩阵 $\mathbf{D} \in \mathbb{R}^{M \times K}$ 和 $N$ 个 $s$-稀疏向量 $\mathbf{x}_i \in \mathbb{R}^{K}$ 的问题,在信号处理和数据科学的应用中日益重要。当字典已知时,即使稀疏度与维度 $M$ 呈线性关系,恢复 $\mathbf{x}_i$ 也是可能的,然而迄今为止,唯一能在线性稀疏区域中被证明成功的算法是黎曼信赖域方法(仅限于正交字典)以及基于和平方层次结构的方法(该方法需要超多项式时间才能获得随 $M$ 衰减的误差)。在这项工作中,我们提出了SPORADIC(谱神谕字典学习),一种针对加权协方差矩阵族的有效谱方法。我们证明,在足够高的维度下,即使稀疏度与维度呈线性关系(至多相差对数因子),SPORADIC 也能恢复满足著名的受限等距性质(RIP)的过完备($K > M$)字典。此外,这些精度保证具有“神谕性质”,即未知稀疏向量 $\mathbf{x}_i$ 的支撑和符号能够以高概率精确恢复,从而允许在多项式时间内通过足够多的样本对 $\mathbf{D}$ 进行任意精度的估计。据作者所知,SPORADIC 是首个在近线性稀疏区域中针对过完备RIP矩阵被证明享有此类收敛保证的多项式时间算法。