A Deep Neural Network (DNN) is a composite function of vector-valued functions, and in order to train a DNN, it is necessary to calculate the gradient of the loss function with respect to all parameters. This calculation can be a non-trivial task because the loss function of a DNN is a composition of several nonlinear functions, each with numerous parameters. The Backpropagation (BP) algorithm leverages the composite structure of the DNN to efficiently compute the gradient. As a result, the number of layers in the network does not significantly impact the complexity of the calculation. The objective of this paper is to express the gradient of the loss function in terms of a matrix multiplication using the Jacobian operator. This can be achieved by considering the total derivative of each layer with respect to its parameters and expressing it as a Jacobian matrix. The gradient can then be represented as the matrix product of these Jacobian matrices. This approach is valid because the chain rule can be applied to a composition of vector-valued functions, and the use of Jacobian matrices allows for the incorporation of multiple inputs and outputs. By providing concise mathematical justifications, the results can be made understandable and useful to a broad audience from various disciplines.
翻译:深度神经网络(DNN)是向量值函数的复合函数,为了训练DNN,需要计算损失函数对所有参数的梯度。这一计算并非易事,因为DNN的损失函数是由多个非线性函数复合而成,且每个函数都含有大量参数。反向传播(BP)算法利用DNN的复合结构高效计算梯度,使得网络层数对计算复杂度的影响微乎其微。本文旨在通过雅可比算子将损失函数的梯度表示为矩阵乘法形式。具体而言,通过考虑每一层对其参数的总体导数并将其表示为雅可比矩阵,梯度可表述为这些雅可比矩阵的矩阵乘积。该方法之所以有效,是因为链式法则可适用于向量值函数的复合,而雅可比矩阵的使用能够处理多输入多输出情形。通过提供简洁的数学证明,本文结果将易于理解且对来自不同学科的广泛读者群体有所裨益。