Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods.
翻译:近期研究表明,基于梯度的单指标模型(即依赖于输入数据一维投影的函数)学习的样本复杂度受其信息指数主导。然而,这些结论仅适用于各向同性数据,而实际输入往往包含可隐含引导算法的附加结构。本研究探讨尖峰协方差结构的影响,揭示若干有趣现象。首先,我们证明在各向异性场景下,即便尖峰与目标方向完全对齐,常用的球形梯度动力学可能无法恢复真实方向。其次,我们发现类似批归一化的适当权重归一化可缓解此问题。进一步,通过利用(尖峰)输入协方差与目标之间的对齐关系,相比各向同性情形,我们获得了更优的样本复杂度。特别地,在尖峰足够大的尖峰模型下,基于梯度训练的样本复杂度可独立于信息指数,同时超越旋转不变核方法的下界性能。