A central question in deep learning is how deep neural networks (DNNs) learn features. DNN layers progressively collapse data into a regular low-dimensional geometry. This collective effect of non-linearity, noise, learning rate, width, depth, and numerous other parameters, has eluded first-principles theories which are built from microscopic neuronal dynamics. Here we present a noise-non-linearity phase diagram that highlights where shallow or deep layers learn features more effectively. We then propose a macroscopic mechanical theory of feature learning that accurately reproduces this phase diagram, offering a clear intuition for why and how some DNNs are ``lazy'' and some are ``active'', and relating the distribution of feature learning over layers with test accuracy.
翻译:深度学习中的一个核心问题是深度神经网络(DNNs)如何学习特征。DNN的各层逐步将数据坍缩为一种规则的低维几何结构。这种由非线性、噪声、学习率、宽度、深度以及众多其他参数共同产生的集体效应,一直难以从基于微观神经元动力学的第一性原理理论中得到解释。本文提出了一个噪声-非线性相图,用以突显浅层或深层在哪些区域能更有效地学习特征。随后,我们提出了一种宏观力学特征学习理论,该理论精确地复现了此相图,为理解为何以及如何区分“惰性”与“活跃”的DNN提供了清晰的直观解释,并将各层特征学习的分布与测试准确率联系起来。