A Dynamical Systems Perspective on the Analysis of Neural Networks

In this chapter, we utilize dynamical systems to analyze several aspects of machine learning algorithms. As an expository contribution we demonstrate how to re-formulate a wide variety of challenges from deep neural networks, (stochastic) gradient descent, and related topics into dynamical statements. We also tackle three concrete challenges. First, we consider the process of information propagation through a neural network, i.e., we study the input-output map for different architectures. We explain the universal embedding property for augmented neural ODEs representing arbitrary functions of given regularity, the classification of multilayer perceptrons and neural ODEs in terms of suitable function classes, and the memory-dependence in neural delay equations. Second, we consider the training aspect of neural networks dynamically. We describe a dynamical systems perspective on gradient descent and study stability for overdetermined problems. We then extend this analysis to the overparameterized setting and describe the edge of stability phenomenon, also in the context of possible explanations for implicit bias. For stochastic gradient descent, we present stability results for the overparameterized setting via Lyapunov exponents of interpolation solutions. Third, we explain several results regarding mean-field limits of neural networks. We describe a result that extends existing techniques to heterogeneous neural networks involving graph limits via digraph measures. This shows how large classes of neural networks naturally fall within the framework of Kuramoto-type models on graphs and their large-graph limits. Finally, we point out that similar strategies to use dynamics to study explainable and reliable AI can also be applied to settings such as generative models or fundamental issues in gradient training methods, such as backpropagation or vanishing/exploding gradients.

翻译：在本章中，我们利用动力系统理论分析机器学习算法的若干方面。作为阐述性贡献，我们展示了如何将深度神经网络、（随机）梯度下降及相关主题中的广泛挑战重新表述为动力学命题。同时，我们针对三个具体问题展开研究。其一，考虑信息在神经网络中的传播过程，即研究不同架构下的输入-输出映射。我们阐释了表示给定正则性任意函数的增广神经常微分方程的通用嵌入性质、基于合适函数类对多层感知机与神经常微分方程的分类，以及神经延迟方程中的记忆依赖性。其二，从动力学角度研究神经网络的训练过程。我们提出梯度下降的动力学系统视角，并研究超定问题的稳定性。随后将该分析扩展至过参数化场景，描述稳定性边缘现象及其在隐式偏差可能解释中的作用。针对随机梯度下降，通过插值解的Lyapunov指数给出过参数化场景的稳定性结果。其三，阐释神经网络平均场极限的若干结论。我们描述了一项将现有技术通过有向图测度扩展至包含图极限的异质神经网络的研究成果，揭示了如何使大规模神经网络自然符合图上的Kuramoto模型框架及其大图极限。最后指出，类似利用动力学研究可解释与可信人工智能的策略，还可应用于生成模型等场景，以及反向传播、梯度消失/爆炸等梯度训练方法的基础性问题。