We study computational-statistical gaps for improper learning in sparse linear regression. More specifically, given $n$ samples from a $k$-sparse linear model in dimension $d$, we ask what is the minimum sample complexity to efficiently (in time polynomial in $d$, $k$, and $n$) find a potentially dense estimate for the regression vector that achieves non-trivial prediction error on the $n$ samples. Information-theoretically this can be achieved using $\Theta(k \log (d/k))$ samples. Yet, despite its prominence in the literature, there is no polynomial-time algorithm known to achieve the same guarantees using less than $\Theta(d)$ samples without additional restrictions on the model. Similarly, existing hardness results are either restricted to the proper setting, in which the estimate must be sparse as well, or only apply to specific algorithms. We give evidence that efficient algorithms for this task require at least (roughly) $\Omega(k^2)$ samples. In particular, we show that an improper learning algorithm for sparse linear regression can be used to solve sparse PCA problems (with a negative spike) in their Wishart form, in regimes in which efficient algorithms are widely believed to require at least $\Omega(k^2)$ samples. We complement our reduction with low-degree and statistical query lower bounds for the sparse PCA problems from which we reduce. Our hardness results apply to the (correlated) random design setting in which the covariates are drawn i.i.d. from a mean-zero Gaussian distribution with unknown covariance.
翻译:我们研究了稀疏线性回归中非恰当学习的计算统计间隙问题。具体而言,给定维度为$d$的$k$-稀疏线性模型中的$n$个样本,我们探究在时间多项式于$d$、$k$及$n$的约束下,为获得一个可能密集的回归向量估计量并使其在$n$个样本上实现非平凡预测误差所需的最小样本复杂度。从信息论角度看,使用$\Theta(k \log (d/k))$个样本即可达成此目标。然而,尽管文献中对此问题高度关注,但在无需额外模型约束的条件下,尚无已知的多项式时间算法能使用少于$\Theta(d)$个样本实现相同的保障。现有困难性结果要么局限于估计量必须保持稀疏的恰当学习场景,要么仅适用于特定算法。我们给出证据表明,该任务的高效算法至少需要(近似)$\Omega(k^2)$个样本。特别地,我们证明稀疏线性回归的非恰当学习算法可用于解决其Wishart形式下的稀疏主成分分析(带负尖峰)问题,而在此类问题中,广泛认为高效算法至少需要$\Omega(k^2)$个样本。我们辅以所归约的稀疏主成分分析问题的低度统计查询下界来补充该归约。我们的困难性结果适用于(相关)随机设计场景,其中协变量独立同分布于均值未知协方差的高斯分布。