This paper presents a mathematical analysis of ODE-Net, a continuum model of deep neural networks (DNNs). In recent years, Machine Learning researchers have introduced ideas of replacing the deep structure of DNNs with ODEs as a continuum limit. These studies regard the "learning" of ODE-Net as the minimization of a "loss" constrained by a parametric ODE. Although the existence of a minimizer for this minimization problem needs to be assumed, only a few studies have investigated its existence analytically in detail. In the present paper, the existence of a minimizer is discussed based on a formulation of ODE-Net as a measure-theoretic mean-field optimal control problem. The existence result is proved when a neural network, which describes a vector field of ODE-Net, is linear with respect to learnable parameters. The proof employs the measure-theoretic formulation combined with the direct method of Calculus of Variations. Secondly, an idealized minimization problem is proposed to remove the above linearity assumption. Such a problem is inspired by a kinetic regularization associated with the Benamou--Brenier formula and universal approximation theorems for neural networks. The proofs of these existence results use variational methods, differential equations, and mean-field optimal control theory. They will stand for a new analytic way to investigate the learning process of deep neural networks.
翻译:本文对ODE-Net进行了数学分析,ODE-Net是深度神经网络(DNNs)的连续模型。近年来,机器学习研究者引入了将深度神经网络深层结构替换为常微分方程(ODE)作为连续极限的思想。这些研究将ODE-Net的“学习”视为受参数化ODE约束的“损失”函数最小化问题。尽管需要假设该最小化问题存在极小元,但仅有少数研究对其存在性进行了详细分析。本文基于将ODE-Net表述为测度论平均场最优控制问题的框架,讨论了极小元的存在性。当描述ODE-Net向量场的神经网络关于可学习参数为线性时,证明了存在性结果。该证明采用了测度论公式结合变分法的直接方法。其次,为去除上述线性假设,提出了一种理想化的最小化问题。该问题受Benamou-Brenier公式相关的动力学正则化及神经网络通用逼近定理启发。这些存在性结果的证明使用了变分方法、微分方程和平均场最优控制理论。它们将为研究深度神经网络学习过程提供一种新的解析途径。