Dictionary Learning for the Almost-Linear Sparsity Regime

Dictionary learning, the problem of recovering a sparsely used matrix $\mathbf{D} \in \mathbb{R}^{M \times K}$ and $N$ $s$-sparse vectors $\mathbf{x}_i \in \mathbb{R}^{K}$ from samples of the form $\mathbf{y}_i = \mathbf{D}\mathbf{x}_i$, is of increasing importance to applications in signal processing and data science. When the dictionary is known, recovery of $\mathbf{x}_i$ is possible even for sparsity linear in dimension $M$, yet to date, the only algorithms which provably succeed in the linear sparsity regime are Riemannian trust-region methods, which are limited to orthogonal dictionaries, and methods based on the sum-of-squares hierarchy, which requires super-polynomial time in order to obtain an error which decays in $M$. In this work, we introduce SPORADIC (SPectral ORAcle DICtionary Learning), an efficient spectral method on family of reweighted covariance matrices. We prove that in high enough dimensions, SPORADIC can recover overcomplete ($K > M$) dictionaries satisfying the well-known restricted isometry property (RIP) even when sparsity is linear in dimension up to logarithmic factors. Moreover, these accuracy guarantees have an ``oracle property" that the support and signs of the unknown sparse vectors $\mathbf{x}_i$ can be recovered exactly with high probability, allowing for arbitrarily close estimation of $\mathbf{D}$ with enough samples in polynomial time. To the author's knowledge, SPORADIC is the first polynomial-time algorithm which provably enjoys such convergence guarantees for overcomplete RIP matrices in the near-linear sparsity regime.

翻译：字典学习，即从形如 $\mathbf{y}_i = \mathbf{D}\mathbf{x}_i$ 的样本中恢复稀疏使用的矩阵 $\mathbf{D} \in \mathbb{R}^{M \times K}$ 和 $N$ 个 $s$-稀疏向量 $\mathbf{x}_i \in \mathbb{R}^{K}$ 的问题，在信号处理和数据科学的应用中日益重要。当字典已知时，即使稀疏度与维度 $M$ 呈线性关系，恢复 $\mathbf{x}_i$ 也是可能的，然而迄今为止，唯一能在线性稀疏区域中被证明成功的算法是黎曼信赖域方法（仅限于正交字典）以及基于和平方层次结构的方法（该方法需要超多项式时间才能获得随 $M$ 衰减的误差）。在这项工作中，我们提出了SPORADIC（谱神谕字典学习），一种针对加权协方差矩阵族的有效谱方法。我们证明，在足够高的维度下，即使稀疏度与维度呈线性关系（至多相差对数因子），SPORADIC 也能恢复满足著名的受限等距性质（RIP）的过完备（$K > M$）字典。此外，这些精度保证具有“神谕性质”，即未知稀疏向量 $\mathbf{x}_i$ 的支撑和符号能够以高概率精确恢复，从而允许在多项式时间内通过足够多的样本对 $\mathbf{D}$ 进行任意精度的估计。据作者所知，SPORADIC 是首个在近线性稀疏区域中针对过完备RIP矩阵被证明享有此类收敛保证的多项式时间算法。