Learning time-scales in two-layers neural networks

Gradient-based learning in multi-layer neural networks displays a number of striking features. In particular, the decrease rate of empirical risk is non-monotone even after averaging over large batches. Long plateaus in which one observes barely any progress alternate with intervals of rapid decrease. These successive phases of learning often take place on very different time scales. Finally, models learnt in an early phase are typically `simpler' or `easier to learn' although in a way that is difficult to formalize. Although theoretical explanations of these phenomena have been put forward, each of them captures at best certain specific regimes. In this paper, we study the gradient flow dynamics of a wide two-layer neural network in high-dimension, when data are distributed according to a single-index model (i.e., the target function depends on a one-dimensional projection of the covariates). Based on a mixture of new rigorous results, non-rigorous mathematical derivations, and numerical simulations, we propose a scenario for the learning dynamics in this setting. In particular, the proposed evolution exhibits separation of timescales and intermittency. These behaviors arise naturally because the population gradient flow can be recast as a singularly perturbed dynamical system.

翻译：梯度驱动的多层神经网络学习过程展现出若干显著特征。具体而言，即使在大批量数据平均化后，经验风险的下降速率仍呈现非单调性。漫长的进步停滞期（其间几乎观察不到任何改进）与快速下降间隔交替出现。这些连续的学习阶段往往在截然不同的时间尺度上展开。此外，早期阶段习得的模型通常更为"简单"或"易于学习"，尽管这种特性难以形式化表述。虽然已有理论对这些现象进行解释，但每种解释最多只能捕捉特定机制场景。本研究针对高维空间中宽两层神经网络的梯度流动力学展开分析，数据服从单指标模型分布（即目标函数依赖于协变量的一维投影）。通过结合全新严谨结果、非严谨数学推导及数值模拟，我们提出了该场景下学习动力学的一种演化图景。特别地，所提出的演化过程表现出时间尺度分离与间歇性特征。这些行为之所以自然涌现，是因为总体梯度流可被重构为奇异摄动动力系统。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日