In this work, we tackle the following question: Can neural networks trained with gradient-based methods achieve the optimal computational-statistical tradeoff in learning Gaussian single-index models? Prior research has shown that any polynomial-time algorithm under the statistical query (SQ) framework requires $Ω(d^{s^\star/2}\lor d)$ samples, where $s^\star$ is the generative exponent representing the intrinsic difficulty of learning the underlying model. However, it remains unknown whether neural networks can achieve this sample complexity. Inspired by prior techniques such as label transformation and landscape smoothing for learning single-index models, we propose a unified gradient-based algorithm for training a two-layer neural network in polynomial time. Our method is adaptable to a variety of loss and activation functions, covering a broad class of existing approaches. We show that our algorithm learns a feature representation that strongly aligns with the unknown signal $θ^\star$, with sample complexity $\widetilde{O} (d^{s^\star/2} \lor d)$, matching the SQ lower bound up to a polylogarithmic factor for all generative exponents $s^\star\geq 1$. Furthermore, we extend our approach to the setting where $θ^\star$ is $k$-sparse for $k = o(\sqrt{d})$ by introducing a novel weight perturbation technique that leverages the sparsity structure. We derive a corresponding SQ lower bound of order $\widetildeΩ(k^{s^\star})$, matched by our method up to a polylogarithmic factor. Our framework, especially the weight perturbation technique, is of independent interest, and suggests potential gradient-based solutions to other problems such as sparse tensor PCA.
翻译:本文探讨以下问题:基于梯度方法训练的神经网络能否在学习高斯单指标模型时实现最优的计算-统计权衡?已有研究表明,统计查询框架下的任意多项式时间算法需要 $\Omega(d^{s^\star/2}\lor d)$ 个样本,其中 $s^\star$ 为表征底层模型学习内在难度的生成指数。然而,神经网络能否达到该样本复杂度仍是未解之谜。受标签变换和景观平滑等单指标模型学习先验技术启发,我们提出了一种统一的基于梯度的多项式时间算法,用于训练两层神经网络。该方法可适配多种损失函数与激活函数,涵盖现有方法的广泛类别。我们证明,该算法能学习到与未知信号 $\theta^\star$ 强对齐的特征表示,其样本复杂度为 $\widetilde{O} (d^{s^\star/2} \lor d)$,对于所有生成指数 $s^\star\geq 1$ 而言,该复杂度与统计查询下界仅相差多对数因子。进一步地,我们在 $\theta^\star$ 为 $k$-稀疏($k = o(\sqrt{d})$)的场景中,通过引入一种利用稀疏结构的新颖权重扰动技术扩展了该方法。我们推导出对应的统计查询下界为 $\widetilde\Omega(k^{s^\star})$ 阶,而我们的方法在多对数因子内与之匹配。所提出的框架,尤其是权重扰动技术,具有独立的研究价值,并有望为稀疏张量主成分分析等其他问题提供基于梯度的解决方案。