We investigate the time complexity of SGD learning on fully-connected neural networks with isotropic data. We put forward a complexity measure -- the leap -- which measures how "hierarchical" target functions are. For $d$-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function $f$ with low-dimensional support is $\tilde\Theta (d^{\max(\mathrm{Leap}(f),2)})$. We prove a version of this conjecture for a class of functions on Gaussian isotropic data and 2-layer neural networks, under additional technical assumptions on how SGD is run. We show that the training sequentially learns the function support with a saddle-to-saddle dynamic. Our result departs from [Abbe et al. 2022] by going beyond leap 1 (merged-staircase functions), and by going beyond the mean-field and gradient flow approximations that prohibit the full complexity control obtained here. Finally, we note that this gives an SGD complexity for the full training trajectory that matches that of Correlational Statistical Query (CSQ) lower-bounds.
翻译:我们研究了在全连接神经网络上使用各向同性数据进行SGD学习的时间复杂度。我们提出了一种复杂度度量——跃迁,用以衡量目标函数的“层级”程度。对于d维均匀布尔数据或各向同性高斯数据,我们的主要猜想表明:学习一个具有低维支撑的函数f的时间复杂度为$\tilde\Theta (d^{\max(\mathrm{Leap}(f),2)})$。我们针对高斯各向同性数据上的函数类及两层神经网络,在SGD运行方式的附加技术假设下,证明了该猜想的一个版本。我们证明训练过程通过鞍点至鞍点动态顺序学习函数的支撑结构。该结果超越了[Abbe et al. 2022]的研究:不仅突破了跃迁为1的情况(合并阶梯函数),还突破了平均场和梯度流近似(这些近似阻碍了本文所获得的完全复杂度控制)。最后,我们指出该结果给出了与相关统计查询下界相匹配的完整训练轨迹的SGD复杂度。