This paper proposes the Nerual Energy Descent (NED) via neural network evolution equations for a wide class of deep learning problems. We show that deep learning can be reformulated as the evolution of network parameters in an evolution equation and the steady state solution of the partial differential equation (PDE) provides a solution to deep learning. This equation corresponds to a gradient descent flow of a variational problem and hence the proposed time-dependent PDE solves an energy minimization problem to obtain a global minimizer of deep learning. This gives a novel interpretation and solution to deep learning optimization. The computational complexity of the proposed energy descent method can be enhanced by randomly sampling the spatial domain of the PDE leading to an efficient NED. Numerical examples are provided to demonstrate the numerical advantage of NED over stochastic gradient descent (SGD).
翻译:本文提出了一种通过神经网络演化方程实现的神经能量下降方法,适用于广泛的深度学习问题。我们表明深度学习可重新表述为网络参数在演化方程中的演化过程,且该偏微分方程的稳态解为深度学习问题提供了求解方案。该方程对应变分问题的梯度下降流,因此所提出的时变偏微分方程通过求解能量最小化问题获得深度学习的全局极小值。这为深度学习优化提供了新的解释与求解途径。通过随机采样偏微分方程的空间域可提升所提能量下降方法的计算复杂度,从而构建高效的神经能量下降算法。数值实验证明了该方法相比随机梯度下降在计算性能上的优势。