We prove that the standard gradient flow in parameter space that underlies many training algorithms in deep learning can be continuously deformed into an adapted gradient flow which yields (constrained) Euclidean gradient flow in output space. Moreover, for the $L^{2}$ loss, if the Jacobian of the outputs with respect to the parameters is full rank (for fixed training data), then the time variable can be reparametrized so that the resulting flow is simply linear interpolation, and a global minimum can be achieved. For the cross-entropy loss, under the same rank condition and assuming the labels have positive components, we derive an explicit formula for the unique global minimum.
翻译:我们证明,深度学习许多训练算法所基于的参数空间标准梯度流,可以通过连续变形转化为一种适应性的梯度流,该梯度流在输出空间中产生(约束的)欧几里得梯度流。此外,对于$L^{2}$损失函数,若输出关于参数的雅可比矩阵(在固定训练数据下)满秩,则可通过重新参数化时间变量,使得所得流简化为线性插值,并能够达到全局最小值。对于交叉熵损失,在相同秩条件且假设标签具有正分量的前提下,我们推导出了唯一全局最小值的显式表达式。