Statistical-Computational Tradeoffs in Mixed Sparse Linear Regression

We consider the problem of mixed sparse linear regression with two components, where two real $k$-sparse signals $\beta_1, \beta_2$ are to be recovered from $n$ unlabelled noisy linear measurements. The sparsity is allowed to be sublinear in the dimension, and additive noise is assumed to be independent Gaussian with variance $\sigma^2$. Prior work has shown that the problem suffers from a $\frac{k}{SNR^2}$-to-$\frac{k^2}{SNR^2}$ statistical-to-computational gap, resembling other computationally challenging high-dimensional inference problems such as Sparse PCA and Robust Sparse Mean Estimation; here $SNR$ is the signal-to-noise ratio. We establish the existence of a more extensive computational barrier for this problem through the method of low-degree polynomials, but show that the problem is computationally hard only in a very narrow symmetric parameter regime. We identify a smooth information-computation tradeoff between the sample complexity $n$ and runtime for any randomized algorithm in this hard regime. Via a simple reduction, this provides novel rigorous evidence for the existence of a computational barrier to solving exact support recovery in sparse phase retrieval with sample complexity $n = \tilde{o}(k^2)$. Our second contribution is to analyze a simple thresholding algorithm which, outside of the narrow regime where the problem is hard, solves the associated mixed regression detection problem in $O(np)$ time with square-root the number of samples and matches the sample complexity required for (non-mixed) sparse linear regression; this allows the recovery problem to be subsequently solved by state-of-the-art techniques from the dense case. As a special case of our results, we show that this simple algorithm is order-optimal among a large family of algorithms in solving exact signed support recovery in sparse linear regression.

翻译：我们考虑具有两个分量的混合稀疏线性回归问题，其中两个实值$k$-稀疏信号$\beta_1, \beta_2$需从$n$个未标记的含噪线性测量中恢复。稀疏度允许在维度上呈次线性，且加性噪声假设为独立高斯分布，方差为$\sigma^2$。先前研究表明，该问题存在从$\frac{k}{SNR^2}$到$\frac{k^2}{SNR^2}$的统计-计算差距，类似于其他计算挑战性高维推断问题（如稀疏PCA和鲁棒稀疏均值估计）；这里$SNR$为信噪比。我们通过低次多项式方法证明该问题存在更广泛的计算障碍，但仅在非常狭窄的对称参数区域内呈现计算困难。我们发现在该困难区域内，随机算法的样本复杂度$n$与运行时间之间存在平滑的信息-计算权衡。通过简单归约，这为稀疏相位恢复中精确支持恢复在样本复杂度$n = \tilde{o}(k^2)$时存在计算障碍提供了新的严谨证据。我们的第二个贡献是分析了一种简单阈值算法：在问题非困难的狭窄区域之外，该算法能在$O(np)$时间内以样本数的平方根解决关联的混合回归检测问题，并匹配（非混合）稀疏线性回归所需的样本复杂度；这使得恢复问题可随后通过密集情形下的最先进技术解决。作为我们结果的特例，我们证明该简单算法在稀疏线性回归的精确带符号支持恢复问题中，在一大类算法中达到阶数最优。