We investigate the time complexity of SGD learning on fully-connected neural networks with isotropic data. We put forward a complexity measure -- the leap -- which measures how "hierarchical" target functions are. For $d$-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function $f$ with low-dimensional support is $\tilde\Theta (d^{\max(\mathrm{Leap}(f),2)})$. We prove a version of this conjecture for a class of functions on Gaussian isotropic data and 2-layer neural networks, under additional technical assumptions on how SGD is run. We show that the training sequentially learns the function support with a saddle-to-saddle dynamic. Our result departs from [Abbe et al. 2022] by going beyond leap 1 (merged-staircase functions), and by going beyond the mean-field and gradient flow approximations that prohibit the full complexity control obtained here. Finally, we note that this gives an SGD complexity for the full training trajectory that matches that of Correlational Statistical Query (CSQ) lower-bounds.
翻译:我们研究了在全连接神经网络上使用各向同性数据进行SGD学习的时间复杂度。我们提出了一种复杂度度量——跳跃(leap)——用于衡量目标函数的"层次性"程度。对于$d$维均匀布尔数据或各向同性高斯数据,我们的主要猜想是:学习具有低维支撑的函数$f$的时间复杂度为$\tilde\Theta (d^{\max(\mathrm{Leap}(f),2)})$。我们针对高斯各向同性数据上的函数类和两层神经网络,在SGD运行方式的额外技术假设下,证明了该猜想的一个版本。研究表明,训练过程通过鞍点到鞍点动力学依次学习函数支撑。本研究结果超越了[Abbe等人2022]的工作,不仅突破了跳跃值1(合并阶梯函数),还突破了平均场和梯度流近似——这些近似本会限制我们在此处获得的完整复杂度控制。最后,我们注意到,这给出了与相关性统计查询下界相匹配的完整训练轨迹的SGD复杂度。