We derive explicit equations governing the cumulative biases and weights in Deep Learning with ReLU activation function, based on gradient descent for the Euclidean cost in the input layer, and under the assumption that the weights are, in a precise sense, adapted to the coordinate system distinguished by the activations. We show that gradient descent corresponds to a dynamical process in the input layer, whereby clusters of data are progressively reduced in complexity ("truncated") at an exponential rate that increases with the number of data points that have already been truncated. We provide a detailed discussion of several types of solutions to the gradient flow equations. A main motivation for this work is to shed light on the interpretability question in supervised learning.
翻译:基于输入层欧几里得代价的梯度下降,并在权重以精确意义适配于激活函数所区分的坐标系这一假设下,我们推导了使用ReLU激活函数的深度学习中累积偏置与权重的显式控制方程。我们证明,梯度下降对应于输入层中的一种动态过程,其中数据簇的复杂性以指数速率逐步降低("截断"),该速率随已截断数据点数量的增加而加快。我们对梯度流方程的多种解类型进行了详细讨论。本研究的一个主要动机是为监督学习中的可解释性问题提供理论启示。