We study computational-statistical gaps for improper learning in sparse linear regression. More specifically, given $n$ samples from a $k$-sparse linear model in dimension $d$, we ask what is the minimum sample complexity to efficiently (in time polynomial in $d$, $k$, and $n$) find a potentially dense estimate for the regression vector that achieves non-trivial prediction error on the $n$ samples. Information-theoretically this can be achieved using $\Theta(k \log (d/k))$ samples. Yet, despite its prominence in the literature, there is no polynomial-time algorithm known to achieve the same guarantees using less than $\Theta(d)$ samples without additional restrictions on the model. Similarly, existing hardness results are either restricted to the proper setting, in which the estimate must be sparse as well, or only apply to specific algorithms. We give evidence that efficient algorithms for this task require at least (roughly) $\Omega(k^2)$ samples. In particular, we show that an improper learning algorithm for sparse linear regression can be used to solve sparse PCA problems (with a negative spike) in their Wishart form, in regimes in which efficient algorithms are widely believed to require at least $\Omega(k^2)$ samples. We complement our reduction with low-degree and statistical query lower bounds for the sparse PCA problems from which we reduce. Our hardness results apply to the (correlated) random design setting in which the covariates are drawn i.i.d. from a mean-zero Gaussian distribution with unknown covariance.
翻译:我们研究稀疏线性回归中非恰当学习的计算-统计间隙。具体而言,给定维度d中k-稀疏线性模型的n个样本,我们探讨:为高效(时间在d、k和n上呈多项式)找到能在n个样本上实现非平凡预测误差的回归向量估计(该估计可能为稠密向量),所需的最小样本复杂度是多少。从信息论角度,使用Θ(k log(d/k))个样本即可实现该目标。然而,尽管该问题在文献中备受关注,目前尚未发现能在不对模型施加额外限制的条件下,使用少于Θ(d)个样本实现相同保证的多项式时间算法。类似地,现有的硬度结果要么局限于估计量也必须保持稀疏性的恰当学习设定,要么仅适用于特定算法。我们证明,该任务的高效算法至少需要(约)Ω(k²)个样本。特别地,我们证明了稀疏线性回归的非恰当学习算法可用于解决Wishart形式的稀疏PCA问题(含负尖峰),而在该问题域中,普遍认为高效算法至少需要Ω(k²)个样本。我们通过从稀疏PCA问题导出的低阶下界与统计查询下界,为归约过程提供了理论补充。我们的硬度结果适用于(相关)随机设计场景,其中协变量从未知协方差的零均值高斯分布中独立同分布抽取。