We focus on the task of learning a single index model $\sigma(w^\star \cdot x)$ with respect to the isotropic Gaussian distribution in $d$ dimensions. Prior work has shown that the sample complexity of learning $w^\star$ is governed by the information exponent $k^\star$ of the link function $\sigma$, which is defined as the index of the first nonzero Hermite coefficient of $\sigma$. Ben Arous et al. (2021) showed that $n \gtrsim d^{k^\star-1}$ samples suffice for learning $w^\star$ and that this is tight for online SGD. However, the CSQ lower bound for gradient based methods only shows that $n \gtrsim d^{k^\star/2}$ samples are necessary. In this work, we close the gap between the upper and lower bounds by showing that online SGD on a smoothed loss learns $w^\star$ with $n \gtrsim d^{k^\star/2}$ samples. We also draw connections to statistical analyses of tensor PCA and to the implicit regularization effects of minibatch SGD on empirical losses.
翻译:我们聚焦于学习单指标模型 $\sigma(w^\star \cdot x)$ 的任务,该模型针对 $d$ 维各向同性高斯分布。先前研究表明,学习 $w^\star$ 的样本复杂度由链接函数 $\sigma$ 的信息指数 $k^\star$ 决定,该指数定义为 $\sigma$ 首个非零Hermite系数的指标。Ben Arous等人(2021)指出,$n \gtrsim d^{k^\star-1}$ 个样本足以学习 $w^\star$,且这一界对在线SGD是紧的。然而,基于梯度方法的CSQ下界仅表明需要 $n \gtrsim d^{k^\star/2}$ 个样本。本文通过证明在平滑损失上运行的在线SGD学习 $w^\star$ 需要 $n \gtrsim d^{k^\star/2}$ 个样本,弥合了上界与下界之间的差距。我们还联系了张量PCA的统计分析以及小批量SGD在经验损失上的隐式正则化效应。